We’ve looked at a few different ways in which we can build models this week, including how to prepare them properly. This weekend we’ll build a multiple linear regression model on a dataset which will need some preparation. The data can be found in the data folder, along with a data dictionary
We want to investigate the avocado dataset, and, in particular, to model the AveragePrice of the avocados. Use the tools we’ve worked with this week in order to prepare your dataset and find appropriate predictors. Once you’ve built your model use the validation techniques discussed on Wednesday to evaluate it. Feel free to focus either on building an explanatory or a predictive model, or both if you are feeling energetic!
As part of the MVP we want you not to just run the code but also have a go at intepreting the results and write your thinking in comments in your script.
Hints and tips
region may lead to many dummy variables. Think carefully about whether to include this variable or not (there is no one ‘right’ answer to this!)Date will not be needed in your models, but can you extract any useful features out of Date before you discard it?leaps or glmulti to help with this.Load libraries:
library(tidyverse)
## ── Attaching packages ──────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.0 ✓ purrr 0.3.3
## ✓ tibble 3.0.1 ✓ dplyr 0.8.5
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## Warning: package 'tibble' was built under R version 3.6.2
## ── Conflicts ─────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
##
## Attaching package: 'GGally'
## The following object is masked from 'package:dplyr':
##
## nasa
library(modelr)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
Load dataset and examine it:
avocados <- clean_names(read_csv("data/avocado.csv"))
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_double(),
## Date = col_date(format = ""),
## AveragePrice = col_double(),
## `Total Volume` = col_double(),
## `4046` = col_double(),
## `4225` = col_double(),
## `4770` = col_double(),
## `Total Bags` = col_double(),
## `Small Bags` = col_double(),
## `Large Bags` = col_double(),
## `XLarge Bags` = col_double(),
## type = col_character(),
## year = col_double(),
## region = col_character()
## )
summary(avocados)
## x1 date average_price total_volume
## Min. : 0.00 Min. :2015-01-04 Min. :0.440 Min. : 85
## 1st Qu.:10.00 1st Qu.:2015-10-25 1st Qu.:1.100 1st Qu.: 10839
## Median :24.00 Median :2016-08-14 Median :1.370 Median : 107377
## Mean :24.23 Mean :2016-08-13 Mean :1.406 Mean : 850644
## 3rd Qu.:38.00 3rd Qu.:2017-06-04 3rd Qu.:1.660 3rd Qu.: 432962
## Max. :52.00 Max. :2018-03-25 Max. :3.250 Max. :62505647
## x4046 x4225 x4770 total_bags
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 854 1st Qu.: 3009 1st Qu.: 0 1st Qu.: 5089
## Median : 8645 Median : 29061 Median : 185 Median : 39744
## Mean : 293008 Mean : 295155 Mean : 22840 Mean : 239639
## 3rd Qu.: 111020 3rd Qu.: 150207 3rd Qu.: 6243 3rd Qu.: 110783
## Max. :22743616 Max. :20470573 Max. :2546439 Max. :19373134
## small_bags large_bags x_large_bags type
## Min. : 0 Min. : 0 Min. : 0.0 Length:18249
## 1st Qu.: 2849 1st Qu.: 127 1st Qu.: 0.0 Class :character
## Median : 26363 Median : 2648 Median : 0.0 Mode :character
## Mean : 182195 Mean : 54338 Mean : 3106.4
## 3rd Qu.: 83338 3rd Qu.: 22029 3rd Qu.: 132.5
## Max. :13384587 Max. :5719097 Max. :551693.7
## year region
## Min. :2015 Length:18249
## 1st Qu.:2015 Class :character
## Median :2016 Mode :character
## Mean :2016
## 3rd Qu.:2017
## Max. :2018
head(avocados)
avocados %>%
distinct(region) %>%
summarise(number_of_regions = n())
avocados %>%
distinct(date) %>%
summarise(
number_of_dates = n(),
min_date = min(date),
max_date = max(date)
)
The x1 variable is related to the database, so we’ll get rid of it. The region variable will lead to many categorical levels, but we can try leaving it in. We should also examine date and perhaps pull out from it whatever features we can.
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
trimmed_avocados <- avocados %>%
mutate(
quarter = as_factor(quarter(date)),
year = as_factor(year),
type = as_factor(type)
) %>%
select(-c("x1", "date"))
Now let’s check for aliased variables (i.e. combinations of variables in which one or more of the variables can be calculated exactly from other variables):
alias(average_price ~ ., data = trimmed_avocados )
## Model :
## average_price ~ total_volume + x4046 + x4225 + x4770 + total_bags +
## small_bags + large_bags + x_large_bags + type + year + region +
## quarter
Nice, we don’t find any aliases.
Run ggpairs() on the remaining variables (leave out region, we’ll boxplot average_price with region next):
trimmed_avocados %>%
select(-region) %>%
ggpairs()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Let’s save that plot so we can zoom in on it more easily
ggsave("pairs_plot_choice1.png", width = 10, height = 10, units = "in")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
trimmed_avocados %>%
ggplot(aes(x = region, y = average_price)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Test competing models with
x4046, type, year, quarter and region:
model1a <- lm(average_price ~ x4046, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1a)
summary(model1a)
##
## Call:
## lm(formula = average_price ~ x4046, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.98539 -0.29842 -0.03531 0.25459 1.82475
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.425e+00 2.993e-03 476.29 <2e-16 ***
## x4046 -6.631e-08 2.305e-09 -28.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3939 on 18247 degrees of freedom
## Multiple R-squared: 0.0434, Adjusted R-squared: 0.04334
## F-statistic: 827.8 on 1 and 18247 DF, p-value: < 2.2e-16
model1b <- lm(average_price ~ type, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1b)
summary(model1b)
##
## Call:
## lm(formula = average_price ~ type, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.21400 -0.20400 -0.02804 0.18600 1.59600
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.158040 0.003321 348.7 <2e-16 ***
## typeorganic 0.495959 0.004697 105.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3173 on 18247 degrees of freedom
## Multiple R-squared: 0.3793, Adjusted R-squared: 0.3792
## F-statistic: 1.115e+04 on 1 and 18247 DF, p-value: < 2.2e-16
model1c <- lm(average_price ~ year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1c)
summary(model1c)
##
## Call:
## lm(formula = average_price ~ year, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.07513 -0.29513 -0.03559 0.25247 1.91136
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.375590 0.005280 260.546 < 2e-16 ***
## year2016 -0.036951 0.007466 -4.949 7.52e-07 ***
## year2017 0.139537 0.007432 18.776 < 2e-16 ***
## year2018 -0.028060 0.012192 -2.301 0.0214 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3956 on 18245 degrees of freedom
## Multiple R-squared: 0.03489, Adjusted R-squared: 0.03474
## F-statistic: 219.9 on 3 and 18245 DF, p-value: < 2.2e-16
model1d <- lm(average_price ~ quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1d)
summary(model1d)
##
## Call:
## lm(formula = average_price ~ quarter, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.96859 -0.30503 -0.02859 0.25497 1.79497
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.306605 0.005316 245.769 <2e-16 ***
## quarter2 0.068428 0.008077 8.472 <2e-16 ***
## quarter3 0.206308 0.008076 25.545 <2e-16 ***
## quarter4 0.151983 0.008019 18.952 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3946 on 18245 degrees of freedom
## Multiple R-squared: 0.04006, Adjusted R-squared: 0.03991
## F-statistic: 253.8 on 3 and 18245 DF, p-value: < 2.2e-16
model1e <- lm(average_price ~ region, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model1e)
summary(model1e)
##
## Call:
## lm(formula = average_price ~ region, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.97095 -0.28423 -0.03432 0.25207 1.76115
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.561036 0.020006 78.029 < 2e-16 ***
## regionAtlanta -0.223077 0.028293 -7.885 3.33e-15 ***
## regionBaltimoreWashington -0.026805 0.028293 -0.947 0.34344
## regionBoise -0.212899 0.028293 -7.525 5.52e-14 ***
## regionBoston -0.030148 0.028293 -1.066 0.28663
## regionBuffaloRochester -0.044201 0.028293 -1.562 0.11824
## regionCalifornia -0.165710 0.028293 -5.857 4.79e-09 ***
## regionCharlotte 0.045000 0.028293 1.591 0.11173
## regionChicago -0.004260 0.028293 -0.151 0.88031
## regionCincinnatiDayton -0.351834 0.028293 -12.436 < 2e-16 ***
## regionColumbus -0.308254 0.028293 -10.895 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.028293 -16.805 < 2e-16 ***
## regionDenver -0.342456 0.028293 -12.104 < 2e-16 ***
## regionDetroit -0.284941 0.028293 -10.071 < 2e-16 ***
## regionGrandRapids -0.056036 0.028293 -1.981 0.04765 *
## regionGreatLakes -0.222485 0.028293 -7.864 3.94e-15 ***
## regionHarrisburgScranton -0.047751 0.028293 -1.688 0.09147 .
## regionHartfordSpringfield 0.257604 0.028293 9.105 < 2e-16 ***
## regionHouston -0.513107 0.028293 -18.136 < 2e-16 ***
## regionIndianapolis -0.247041 0.028293 -8.732 < 2e-16 ***
## regionJacksonville -0.050089 0.028293 -1.770 0.07668 .
## regionLasVegas -0.180118 0.028293 -6.366 1.98e-10 ***
## regionLosAngeles -0.345030 0.028293 -12.195 < 2e-16 ***
## regionLouisville -0.274349 0.028293 -9.697 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.028293 -4.685 2.82e-06 ***
## regionMidsouth -0.156272 0.028293 -5.523 3.37e-08 ***
## regionNashville -0.348935 0.028293 -12.333 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.028293 -9.057 < 2e-16 ***
## regionNewYork 0.166538 0.028293 5.886 4.02e-09 ***
## regionNortheast 0.040888 0.028293 1.445 0.14843
## regionNorthernNewEngland -0.083639 0.028293 -2.956 0.00312 **
## regionOrlando -0.054822 0.028293 -1.938 0.05268 .
## regionPhiladelphia 0.071095 0.028293 2.513 0.01199 *
## regionPhoenixTucson -0.336598 0.028293 -11.897 < 2e-16 ***
## regionPittsburgh -0.196716 0.028293 -6.953 3.70e-12 ***
## regionPlains -0.124527 0.028293 -4.401 1.08e-05 ***
## regionPortland -0.243314 0.028293 -8.600 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.028293 -0.209 0.83434
## regionRichmondNorfolk -0.269704 0.028293 -9.533 < 2e-16 ***
## regionRoanoke -0.313107 0.028293 -11.067 < 2e-16 ***
## regionSacramento 0.060533 0.028293 2.140 0.03241 *
## regionSanDiego -0.162870 0.028293 -5.757 8.72e-09 ***
## regionSanFrancisco 0.243166 0.028293 8.595 < 2e-16 ***
## regionSeattle -0.118462 0.028293 -4.187 2.84e-05 ***
## regionSouthCarolina -0.157751 0.028293 -5.576 2.50e-08 ***
## regionSouthCentral -0.459793 0.028293 -16.251 < 2e-16 ***
## regionSoutheast -0.163018 0.028293 -5.762 8.45e-09 ***
## regionSpokane -0.115444 0.028293 -4.080 4.52e-05 ***
## regionStLouis -0.130414 0.028293 -4.609 4.06e-06 ***
## regionSyracuse -0.040710 0.028293 -1.439 0.15020
## regionTampa -0.152189 0.028293 -5.379 7.58e-08 ***
## regionTotalUS -0.242012 0.028293 -8.554 < 2e-16 ***
## regionWest -0.288817 0.028293 -10.208 < 2e-16 ***
## regionWestTexNewMexico -0.299334 0.028356 -10.556 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3678 on 18195 degrees of freedom
## Multiple R-squared: 0.1681, Adjusted R-squared: 0.1657
## F-statistic: 69.38 on 53 and 18195 DF, p-value: < 2.2e-16
model1b with type is best, so we’ll keep that and re-run ggpairs() with the residuals (again omitting region).
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model1b) %>%
select(-c("average_price", "type", "region"))
ggpairs(avocados_remaining_resid)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggsave("pairs_plot_choice2.png", width = 10, height = 10, units = "in")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
trimmed_avocados %>%
add_residuals(model1b) %>%
ggplot(aes(x = region, y = resid)) +
geom_boxplot() +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Looks like x4046, year, quarter and region are our next strong contenders:
model2a <- lm(average_price ~ type + x4046, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model2a)
summary(model2a)
##
## Call:
## lm(formula = average_price ~ type + x4046, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.21416 -0.20029 -0.02736 0.18591 1.59589
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.171e+00 3.485e-03 336.13 <2e-16 ***
## typeorganic 4.827e-01 4.802e-03 100.52 <2e-16 ***
## x4046 -2.323e-08 1.898e-09 -12.24 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.316 on 18246 degrees of freedom
## Multiple R-squared: 0.3843, Adjusted R-squared: 0.3843
## F-statistic: 5695 on 2 and 18246 DF, p-value: < 2.2e-16
model2b <- lm(average_price ~ type + year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model2b)
summary(model2b)
##
## Call:
## lm(formula = average_price ~ type + year, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.32320 -0.18722 -0.01722 0.18278 1.66337
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.127645 0.004704 239.735 < 2e-16 ***
## typeorganic 0.495980 0.004563 108.685 < 2e-16 ***
## year2016 -0.036995 0.005817 -6.360 2.07e-10 ***
## year2017 0.139580 0.005790 24.107 < 2e-16 ***
## year2018 -0.028104 0.009499 -2.959 0.00309 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3082 on 18244 degrees of freedom
## Multiple R-squared: 0.4142, Adjusted R-squared: 0.4141
## F-statistic: 3225 on 4 and 18244 DF, p-value: < 2.2e-16
model2c <- lm(average_price ~ type + quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model2c)
summary(model2c)
##
## Call:
## lm(formula = average_price ~ type + quarter, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.11458 -0.20089 -0.02458 0.18542 1.54687
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.058626 0.004718 224.38 <2e-16 ***
## typeorganic 0.495958 0.004543 109.16 <2e-16 ***
## quarter2 0.068546 0.006282 10.91 <2e-16 ***
## quarter3 0.206308 0.006281 32.84 <2e-16 ***
## quarter4 0.152040 0.006237 24.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3069 on 18244 degrees of freedom
## Multiple R-squared: 0.4193, Adjusted R-squared: 0.4192
## F-statistic: 3294 on 4 and 18244 DF, p-value: < 2.2e-16
model2d <- lm(average_price ~ type + region, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model2d)
summary(model2d)
##
## Call:
## lm(formula = average_price ~ type + region, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.09858 -0.16716 -0.01814 0.14692 1.51320
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.313079 0.014894 88.159 < 2e-16 ***
## typeorganic 0.495912 0.004017 123.452 < 2e-16 ***
## regionAtlanta -0.223077 0.020871 -10.688 < 2e-16 ***
## regionBaltimoreWashington -0.026805 0.020871 -1.284 0.19906
## regionBoise -0.212899 0.020871 -10.201 < 2e-16 ***
## regionBoston -0.030148 0.020871 -1.444 0.14863
## regionBuffaloRochester -0.044201 0.020871 -2.118 0.03421 *
## regionCalifornia -0.165710 0.020871 -7.940 2.15e-15 ***
## regionCharlotte 0.045000 0.020871 2.156 0.03109 *
## regionChicago -0.004260 0.020871 -0.204 0.83826
## regionCincinnatiDayton -0.351834 0.020871 -16.857 < 2e-16 ***
## regionColumbus -0.308254 0.020871 -14.769 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.020871 -22.780 < 2e-16 ***
## regionDenver -0.342456 0.020871 -16.408 < 2e-16 ***
## regionDetroit -0.284941 0.020871 -13.652 < 2e-16 ***
## regionGrandRapids -0.056036 0.020871 -2.685 0.00726 **
## regionGreatLakes -0.222485 0.020871 -10.660 < 2e-16 ***
## regionHarrisburgScranton -0.047751 0.020871 -2.288 0.02216 *
## regionHartfordSpringfield 0.257604 0.020871 12.342 < 2e-16 ***
## regionHouston -0.513107 0.020871 -24.584 < 2e-16 ***
## regionIndianapolis -0.247041 0.020871 -11.836 < 2e-16 ***
## regionJacksonville -0.050089 0.020871 -2.400 0.01641 *
## regionLasVegas -0.180118 0.020871 -8.630 < 2e-16 ***
## regionLosAngeles -0.345030 0.020871 -16.531 < 2e-16 ***
## regionLouisville -0.274349 0.020871 -13.145 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.020871 -6.351 2.20e-10 ***
## regionMidsouth -0.156272 0.020871 -7.487 7.35e-14 ***
## regionNashville -0.348935 0.020871 -16.718 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.020871 -12.277 < 2e-16 ***
## regionNewYork 0.166538 0.020871 7.979 1.56e-15 ***
## regionNortheast 0.040888 0.020871 1.959 0.05013 .
## regionNorthernNewEngland -0.083639 0.020871 -4.007 6.16e-05 ***
## regionOrlando -0.054822 0.020871 -2.627 0.00863 **
## regionPhiladelphia 0.071095 0.020871 3.406 0.00066 ***
## regionPhoenixTucson -0.336598 0.020871 -16.127 < 2e-16 ***
## regionPittsburgh -0.196716 0.020871 -9.425 < 2e-16 ***
## regionPlains -0.124527 0.020871 -5.966 2.47e-09 ***
## regionPortland -0.243314 0.020871 -11.658 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.020871 -0.284 0.77679
## regionRichmondNorfolk -0.269704 0.020871 -12.922 < 2e-16 ***
## regionRoanoke -0.313107 0.020871 -15.002 < 2e-16 ***
## regionSacramento 0.060533 0.020871 2.900 0.00373 **
## regionSanDiego -0.162870 0.020871 -7.803 6.35e-15 ***
## regionSanFrancisco 0.243166 0.020871 11.651 < 2e-16 ***
## regionSeattle -0.118462 0.020871 -5.676 1.40e-08 ***
## regionSouthCarolina -0.157751 0.020871 -7.558 4.28e-14 ***
## regionSouthCentral -0.459793 0.020871 -22.030 < 2e-16 ***
## regionSoutheast -0.163018 0.020871 -7.811 6.00e-15 ***
## regionSpokane -0.115444 0.020871 -5.531 3.22e-08 ***
## regionStLouis -0.130414 0.020871 -6.248 4.24e-10 ***
## regionSyracuse -0.040710 0.020871 -1.951 0.05113 .
## regionTampa -0.152189 0.020871 -7.292 3.18e-13 ***
## regionTotalUS -0.242012 0.020871 -11.595 < 2e-16 ***
## regionWest -0.288817 0.020871 -13.838 < 2e-16 ***
## regionWestTexNewMexico -0.297114 0.020918 -14.204 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2713 on 18194 degrees of freedom
## Multiple R-squared: 0.5473, Adjusted R-squared: 0.546
## F-statistic: 407.4 on 54 and 18194 DF, p-value: < 2.2e-16
So model2d with type and region comes out as better here. We have some region coefficients that are not significant at \(0.05\) level, so let’s run an anova() to test whether to include region
anova(model1b, model2d)
It seems region is significant overall, so we’ll keep it in!
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model2d) %>%
select(-c("average_price", "type", "region"))
ggpairs(avocados_remaining_resid)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggsave("pairs_plot_choice3.png", width = 10, height = 10, units = "in")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The next contender variables look to be x_large_bags, year and quarter. Let’s try them out.
model3a <- lm(average_price ~ type + region + x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model3a)
summary(model3a)
##
## Call:
## lm(formula = average_price ~ type + region + x_large_bags, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.10024 -0.16726 -0.01734 0.14591 1.51156
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.311e+00 1.489e-02 88.033 < 2e-16 ***
## typeorganic 5.001e-01 4.101e-03 121.953 < 2e-16 ***
## regionAtlanta -2.235e-01 2.086e-02 -10.718 < 2e-16 ***
## regionBaltimoreWashington -2.713e-02 2.086e-02 -1.301 0.193298
## regionBoise -2.128e-01 2.086e-02 -10.204 < 2e-16 ***
## regionBoston -3.023e-02 2.086e-02 -1.449 0.147234
## regionBuffaloRochester -4.428e-02 2.086e-02 -2.123 0.033774 *
## regionCalifornia -1.762e-01 2.096e-02 -8.408 < 2e-16 ***
## regionCharlotte 4.495e-02 2.086e-02 2.155 0.031177 *
## regionChicago -4.936e-03 2.086e-02 -0.237 0.812924
## regionCincinnatiDayton -3.523e-01 2.086e-02 -16.890 < 2e-16 ***
## regionColumbus -3.086e-01 2.086e-02 -14.796 < 2e-16 ***
## regionDallasFtWorth -4.762e-01 2.086e-02 -22.832 < 2e-16 ***
## regionDenver -3.425e-01 2.086e-02 -16.420 < 2e-16 ***
## regionDetroit -2.882e-01 2.087e-02 -13.810 < 2e-16 ***
## regionGrandRapids -5.764e-02 2.086e-02 -2.763 0.005731 **
## regionGreatLakes -2.353e-01 2.101e-02 -11.198 < 2e-16 ***
## regionHarrisburgScranton -4.798e-02 2.086e-02 -2.300 0.021451 *
## regionHartfordSpringfield 2.575e-01 2.086e-02 12.347 < 2e-16 ***
## regionHouston -5.137e-01 2.086e-02 -24.628 < 2e-16 ***
## regionIndianapolis -2.475e-01 2.086e-02 -11.867 < 2e-16 ***
## regionJacksonville -5.021e-02 2.086e-02 -2.407 0.016074 *
## regionLasVegas -1.801e-01 2.086e-02 -8.633 < 2e-16 ***
## regionLosAngeles -3.532e-01 2.092e-02 -16.881 < 2e-16 ***
## regionLouisville -2.745e-01 2.086e-02 -13.160 < 2e-16 ***
## regionMiamiFtLauderdale -1.331e-01 2.086e-02 -6.380 1.81e-10 ***
## regionMidsouth -1.590e-01 2.086e-02 -7.619 2.68e-14 ***
## regionNashville -3.491e-01 2.086e-02 -16.736 < 2e-16 ***
## regionNewOrleansMobile -2.572e-01 2.086e-02 -12.330 < 2e-16 ***
## regionNewYork 1.659e-01 2.086e-02 7.954 1.91e-15 ***
## regionNortheast 3.834e-02 2.086e-02 1.838 0.066151 .
## regionNorthernNewEngland -8.377e-02 2.086e-02 -4.017 5.93e-05 ***
## regionOrlando -5.523e-02 2.086e-02 -2.648 0.008111 **
## regionPhiladelphia 7.097e-02 2.086e-02 3.403 0.000669 ***
## regionPhoenixTucson -3.368e-01 2.086e-02 -16.149 < 2e-16 ***
## regionPittsburgh -1.967e-01 2.086e-02 -9.433 < 2e-16 ***
## regionPlains -1.267e-01 2.086e-02 -6.072 1.29e-09 ***
## regionPortland -2.434e-01 2.086e-02 -11.669 < 2e-16 ***
## regionRaleighGreensboro -6.021e-03 2.086e-02 -0.289 0.772828
## regionRichmondNorfolk -2.699e-01 2.086e-02 -12.939 < 2e-16 ***
## regionRoanoke -3.132e-01 2.086e-02 -15.015 < 2e-16 ***
## regionSacramento 6.020e-02 2.086e-02 2.886 0.003904 **
## regionSanDiego -1.631e-01 2.086e-02 -7.819 5.64e-15 ***
## regionSanFrancisco 2.428e-01 2.086e-02 11.642 < 2e-16 ***
## regionSeattle -1.185e-01 2.086e-02 -5.682 1.35e-08 ***
## regionSouthCarolina -1.581e-01 2.086e-02 -7.581 3.59e-14 ***
## regionSouthCentral -4.650e-01 2.088e-02 -22.268 < 2e-16 ***
## regionSoutheast -1.680e-01 2.088e-02 -8.046 9.10e-16 ***
## regionSpokane -1.154e-01 2.086e-02 -5.531 3.22e-08 ***
## regionStLouis -1.308e-01 2.086e-02 -6.270 3.69e-10 ***
## regionSyracuse -4.071e-02 2.086e-02 -1.952 0.050993 .
## regionTampa -1.526e-01 2.086e-02 -7.315 2.68e-13 ***
## regionTotalUS -2.852e-01 2.255e-02 -12.648 < 2e-16 ***
## regionWest -2.904e-01 2.086e-02 -13.922 < 2e-16 ***
## regionWestTexNewMexico -2.976e-01 2.090e-02 -14.238 < 2e-16 ***
## x_large_bags 6.810e-07 1.351e-07 5.040 4.70e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2711 on 18193 degrees of freedom
## Multiple R-squared: 0.548, Adjusted R-squared: 0.5466
## F-statistic: 401 on 55 and 18193 DF, p-value: < 2.2e-16
model3b <- lm(average_price ~ type + region + year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model3b)
summary(model3b)
##
## Call:
## lm(formula = average_price ~ type + region + year, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1532 -0.1497 -0.0060 0.1419 1.4849
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.282672 0.014600 87.857 < 2e-16 ***
## typeorganic 0.495933 0.003859 128.501 < 2e-16 ***
## regionAtlanta -0.223077 0.020052 -11.125 < 2e-16 ***
## regionBaltimoreWashington -0.026805 0.020052 -1.337 0.181322
## regionBoise -0.212899 0.020052 -10.617 < 2e-16 ***
## regionBoston -0.030148 0.020052 -1.503 0.132735
## regionBuffaloRochester -0.044201 0.020052 -2.204 0.027515 *
## regionCalifornia -0.165710 0.020052 -8.264 < 2e-16 ***
## regionCharlotte 0.045000 0.020052 2.244 0.024835 *
## regionChicago -0.004260 0.020052 -0.212 0.831748
## regionCincinnatiDayton -0.351834 0.020052 -17.546 < 2e-16 ***
## regionColumbus -0.308254 0.020052 -15.373 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.020052 -23.710 < 2e-16 ***
## regionDenver -0.342456 0.020052 -17.078 < 2e-16 ***
## regionDetroit -0.284941 0.020052 -14.210 < 2e-16 ***
## regionGrandRapids -0.056036 0.020052 -2.794 0.005204 **
## regionGreatLakes -0.222485 0.020052 -11.095 < 2e-16 ***
## regionHarrisburgScranton -0.047751 0.020052 -2.381 0.017259 *
## regionHartfordSpringfield 0.257604 0.020052 12.847 < 2e-16 ***
## regionHouston -0.513107 0.020052 -25.589 < 2e-16 ***
## regionIndianapolis -0.247041 0.020052 -12.320 < 2e-16 ***
## regionJacksonville -0.050089 0.020052 -2.498 0.012501 *
## regionLasVegas -0.180118 0.020052 -8.982 < 2e-16 ***
## regionLosAngeles -0.345030 0.020052 -17.207 < 2e-16 ***
## regionLouisville -0.274349 0.020052 -13.682 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.020052 -6.610 3.95e-11 ***
## regionMidsouth -0.156272 0.020052 -7.793 6.88e-15 ***
## regionNashville -0.348935 0.020052 -17.401 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.020052 -12.779 < 2e-16 ***
## regionNewYork 0.166538 0.020052 8.305 < 2e-16 ***
## regionNortheast 0.040888 0.020052 2.039 0.041459 *
## regionNorthernNewEngland -0.083639 0.020052 -4.171 3.05e-05 ***
## regionOrlando -0.054822 0.020052 -2.734 0.006263 **
## regionPhiladelphia 0.071095 0.020052 3.545 0.000393 ***
## regionPhoenixTucson -0.336598 0.020052 -16.786 < 2e-16 ***
## regionPittsburgh -0.196716 0.020052 -9.810 < 2e-16 ***
## regionPlains -0.124527 0.020052 -6.210 5.41e-10 ***
## regionPortland -0.243314 0.020052 -12.134 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.020052 -0.295 0.767930
## regionRichmondNorfolk -0.269704 0.020052 -13.450 < 2e-16 ***
## regionRoanoke -0.313107 0.020052 -15.615 < 2e-16 ***
## regionSacramento 0.060533 0.020052 3.019 0.002542 **
## regionSanDiego -0.162870 0.020052 -8.122 4.86e-16 ***
## regionSanFrancisco 0.243166 0.020052 12.127 < 2e-16 ***
## regionSeattle -0.118462 0.020052 -5.908 3.53e-09 ***
## regionSouthCarolina -0.157751 0.020052 -7.867 3.83e-15 ***
## regionSouthCentral -0.459793 0.020052 -22.930 < 2e-16 ***
## regionSoutheast -0.163018 0.020052 -8.130 4.58e-16 ***
## regionSpokane -0.115444 0.020052 -5.757 8.69e-09 ***
## regionStLouis -0.130414 0.020052 -6.504 8.04e-11 ***
## regionSyracuse -0.040710 0.020052 -2.030 0.042350 *
## regionTampa -0.152189 0.020052 -7.590 3.36e-14 ***
## regionTotalUS -0.242012 0.020052 -12.069 < 2e-16 ***
## regionWest -0.288817 0.020052 -14.403 < 2e-16 ***
## regionWestTexNewMexico -0.296552 0.020097 -14.756 < 2e-16 ***
## year2016 -0.036970 0.004920 -7.515 5.96e-14 ***
## year2017 0.139555 0.004897 28.500 < 2e-16 ***
## year2018 -0.028078 0.008033 -3.495 0.000475 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2607 on 18191 degrees of freedom
## Multiple R-squared: 0.5822, Adjusted R-squared: 0.5809
## F-statistic: 444.8 on 57 and 18191 DF, p-value: < 2.2e-16
model3c <- lm(average_price ~ type + region + quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model3c)
summary(model3c)
##
## Call:
## lm(formula = average_price ~ type + region + quarter, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.06767 -0.15971 -0.01185 0.14629 1.54411
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.213689 0.014517 83.603 < 2e-16 ***
## typeorganic 0.495911 0.003835 129.296 < 2e-16 ***
## regionAtlanta -0.223077 0.019928 -11.194 < 2e-16 ***
## regionBaltimoreWashington -0.026805 0.019928 -1.345 0.178619
## regionBoise -0.212899 0.019928 -10.683 < 2e-16 ***
## regionBoston -0.030148 0.019928 -1.513 0.130339
## regionBuffaloRochester -0.044201 0.019928 -2.218 0.026565 *
## regionCalifornia -0.165710 0.019928 -8.315 < 2e-16 ***
## regionCharlotte 0.045000 0.019928 2.258 0.023950 *
## regionChicago -0.004260 0.019928 -0.214 0.830716
## regionCincinnatiDayton -0.351834 0.019928 -17.655 < 2e-16 ***
## regionColumbus -0.308254 0.019928 -15.468 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.019928 -23.858 < 2e-16 ***
## regionDenver -0.342456 0.019928 -17.185 < 2e-16 ***
## regionDetroit -0.284941 0.019928 -14.298 < 2e-16 ***
## regionGrandRapids -0.056036 0.019928 -2.812 0.004931 **
## regionGreatLakes -0.222485 0.019928 -11.164 < 2e-16 ***
## regionHarrisburgScranton -0.047751 0.019928 -2.396 0.016577 *
## regionHartfordSpringfield 0.257604 0.019928 12.927 < 2e-16 ***
## regionHouston -0.513107 0.019928 -25.748 < 2e-16 ***
## regionIndianapolis -0.247041 0.019928 -12.397 < 2e-16 ***
## regionJacksonville -0.050089 0.019928 -2.513 0.011963 *
## regionLasVegas -0.180118 0.019928 -9.038 < 2e-16 ***
## regionLosAngeles -0.345030 0.019928 -17.314 < 2e-16 ***
## regionLouisville -0.274349 0.019928 -13.767 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.019928 -6.651 2.99e-11 ***
## regionMidsouth -0.156272 0.019928 -7.842 4.69e-15 ***
## regionNashville -0.348935 0.019928 -17.510 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.019928 -12.858 < 2e-16 ***
## regionNewYork 0.166538 0.019928 8.357 < 2e-16 ***
## regionNortheast 0.040888 0.019928 2.052 0.040208 *
## regionNorthernNewEngland -0.083639 0.019928 -4.197 2.72e-05 ***
## regionOrlando -0.054822 0.019928 -2.751 0.005947 **
## regionPhiladelphia 0.071095 0.019928 3.568 0.000361 ***
## regionPhoenixTucson -0.336598 0.019928 -16.891 < 2e-16 ***
## regionPittsburgh -0.196716 0.019928 -9.871 < 2e-16 ***
## regionPlains -0.124527 0.019928 -6.249 4.23e-10 ***
## regionPortland -0.243314 0.019928 -12.210 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.019928 -0.297 0.766527
## regionRichmondNorfolk -0.269704 0.019928 -13.534 < 2e-16 ***
## regionRoanoke -0.313107 0.019928 -15.712 < 2e-16 ***
## regionSacramento 0.060533 0.019928 3.038 0.002389 **
## regionSanDiego -0.162870 0.019928 -8.173 3.21e-16 ***
## regionSanFrancisco 0.243166 0.019928 12.202 < 2e-16 ***
## regionSeattle -0.118462 0.019928 -5.944 2.82e-09 ***
## regionSouthCarolina -0.157751 0.019928 -7.916 2.59e-15 ***
## regionSouthCentral -0.459793 0.019928 -23.073 < 2e-16 ***
## regionSoutheast -0.163018 0.019928 -8.180 3.02e-16 ***
## regionSpokane -0.115444 0.019928 -5.793 7.03e-09 ***
## regionStLouis -0.130414 0.019928 -6.544 6.14e-11 ***
## regionSyracuse -0.040710 0.019928 -2.043 0.041082 *
## regionTampa -0.152189 0.019928 -7.637 2.33e-14 ***
## regionTotalUS -0.242012 0.019928 -12.144 < 2e-16 ***
## regionWest -0.288817 0.019928 -14.493 < 2e-16 ***
## regionWestTexNewMexico -0.297141 0.019973 -14.877 < 2e-16 ***
## quarter2 0.068479 0.005303 12.912 < 2e-16 ***
## quarter3 0.206308 0.005303 38.906 < 2e-16 ***
## quarter4 0.152007 0.005265 28.869 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2591 on 18191 degrees of freedom
## Multiple R-squared: 0.5874, Adjusted R-squared: 0.5861
## F-statistic: 454.3 on 57 and 18191 DF, p-value: < 2.2e-16
So model3c with type, region and quarter wins out here. Everything still looks reasonable with the diagnostics, perhaps some mild heteroscedasticity.
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model3c) %>%
select(-c("average_price", "type", "region", "quarter"))
ggpairs(avocados_remaining_resid)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggsave("pairs_plot_choice4.png", width = 10, height = 10, units = "in")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The contender variables here are x_large_bags and year, so let’s try them out.
model4a <- lm(average_price ~ type + region + quarter + x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model4a)
summary(model4a)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + x_large_bags,
## data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.06889 -0.16013 -0.01154 0.14553 1.54291
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.212e+00 1.451e-02 83.493 < 2e-16 ***
## typeorganic 4.998e-01 3.916e-03 127.614 < 2e-16 ***
## regionAtlanta -2.235e-01 1.992e-02 -11.222 < 2e-16 ***
## regionBaltimoreWashington -2.711e-02 1.992e-02 -1.361 0.173535
## regionBoise -2.128e-01 1.992e-02 -10.687 < 2e-16 ***
## regionBoston -3.022e-02 1.992e-02 -1.518 0.129137
## regionBuffaloRochester -4.427e-02 1.992e-02 -2.223 0.026233 *
## regionCalifornia -1.753e-01 2.002e-02 -8.759 < 2e-16 ***
## regionCharlotte 4.495e-02 1.992e-02 2.257 0.024015 *
## regionChicago -4.877e-03 1.992e-02 -0.245 0.806549
## regionCincinnatiDayton -3.522e-01 1.992e-02 -17.686 < 2e-16 ***
## regionColumbus -3.086e-01 1.992e-02 -15.494 < 2e-16 ***
## regionDallasFtWorth -4.762e-01 1.992e-02 -23.908 < 2e-16 ***
## regionDenver -3.425e-01 1.992e-02 -17.196 < 2e-16 ***
## regionDetroit -2.879e-01 1.993e-02 -14.449 < 2e-16 ***
## regionGrandRapids -5.750e-02 1.992e-02 -2.887 0.003898 **
## regionGreatLakes -2.342e-01 2.006e-02 -11.671 < 2e-16 ***
## regionHarrisburgScranton -4.796e-02 1.992e-02 -2.408 0.016054 *
## regionHartfordSpringfield 2.575e-01 1.992e-02 12.931 < 2e-16 ***
## regionHouston -5.136e-01 1.992e-02 -25.789 < 2e-16 ***
## regionIndianapolis -2.475e-01 1.992e-02 -12.426 < 2e-16 ***
## regionJacksonville -5.020e-02 1.992e-02 -2.521 0.011720 *
## regionLasVegas -1.801e-01 1.992e-02 -9.041 < 2e-16 ***
## regionLosAngeles -3.524e-01 1.998e-02 -17.644 < 2e-16 ***
## regionLouisville -2.745e-01 1.992e-02 -13.781 < 2e-16 ***
## regionMiamiFtLauderdale -1.330e-01 1.992e-02 -6.679 2.47e-11 ***
## regionMidsouth -1.587e-01 1.992e-02 -7.967 1.72e-15 ***
## regionNashville -3.491e-01 1.992e-02 -17.527 < 2e-16 ***
## regionNewOrleansMobile -2.571e-01 1.992e-02 -12.909 < 2e-16 ***
## regionNewYork 1.660e-01 1.992e-02 8.333 < 2e-16 ***
## regionNortheast 3.856e-02 1.992e-02 1.936 0.052939 .
## regionNorthernNewEngland -8.376e-02 1.992e-02 -4.206 2.61e-05 ***
## regionOrlando -5.519e-02 1.992e-02 -2.771 0.005592 **
## regionPhiladelphia 7.098e-02 1.992e-02 3.564 0.000366 ***
## regionPhoenixTucson -3.368e-01 1.992e-02 -16.911 < 2e-16 ***
## regionPittsburgh -1.967e-01 1.992e-02 -9.879 < 2e-16 ***
## regionPlains -1.265e-01 1.992e-02 -6.350 2.20e-10 ***
## regionPortland -2.434e-01 1.992e-02 -12.220 < 2e-16 ***
## regionRaleighGreensboro -6.012e-03 1.992e-02 -0.302 0.762753
## regionRichmondNorfolk -2.699e-01 1.992e-02 -13.549 < 2e-16 ***
## regionRoanoke -3.132e-01 1.992e-02 -15.725 < 2e-16 ***
## regionSacramento 6.023e-02 1.992e-02 3.024 0.002497 **
## regionSanDiego -1.631e-01 1.992e-02 -8.187 2.85e-16 ***
## regionSanFrancisco 2.429e-01 1.992e-02 12.194 < 2e-16 ***
## regionSeattle -1.185e-01 1.992e-02 -5.950 2.72e-09 ***
## regionSouthCarolina -1.581e-01 1.992e-02 -7.938 2.18e-15 ***
## regionSouthCentral -4.646e-01 1.994e-02 -23.297 < 2e-16 ***
## regionSoutheast -1.676e-01 1.994e-02 -8.404 < 2e-16 ***
## regionSpokane -1.154e-01 1.992e-02 -5.793 7.02e-09 ***
## regionStLouis -1.307e-01 1.992e-02 -6.565 5.35e-11 ***
## regionSyracuse -4.071e-02 1.992e-02 -2.044 0.040974 *
## regionTampa -1.525e-01 1.992e-02 -7.659 1.96e-14 ***
## regionTotalUS -2.814e-01 2.153e-02 -13.068 < 2e-16 ***
## regionWest -2.903e-01 1.992e-02 -14.573 < 2e-16 ***
## regionWestTexNewMexico -2.976e-01 1.996e-02 -14.910 < 2e-16 ***
## quarter2 6.806e-02 5.301e-03 12.839 < 2e-16 ***
## quarter3 2.055e-01 5.302e-03 38.761 < 2e-16 ***
## quarter4 1.527e-01 5.264e-03 29.001 < 2e-16 ***
## x_large_bags 6.215e-07 1.292e-07 4.810 1.52e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2589 on 18190 degrees of freedom
## Multiple R-squared: 0.5879, Adjusted R-squared: 0.5866
## F-statistic: 447.4 on 58 and 18190 DF, p-value: < 2.2e-16
model4b <- lm(average_price ~ type + region + quarter + year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model4b)
summary(model4b)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year,
## data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.03683 -0.14588 -0.00412 0.14386 1.43930
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.167184 0.014290 81.677 < 2e-16 ***
## typeorganic 0.495930 0.003675 134.950 < 2e-16 ***
## regionAtlanta -0.223077 0.019094 -11.683 < 2e-16 ***
## regionBaltimoreWashington -0.026805 0.019094 -1.404 0.160383
## regionBoise -0.212899 0.019094 -11.150 < 2e-16 ***
## regionBoston -0.030148 0.019094 -1.579 0.114368
## regionBuffaloRochester -0.044201 0.019094 -2.315 0.020627 *
## regionCalifornia -0.165710 0.019094 -8.679 < 2e-16 ***
## regionCharlotte 0.045000 0.019094 2.357 0.018445 *
## regionChicago -0.004260 0.019094 -0.223 0.823439
## regionCincinnatiDayton -0.351834 0.019094 -18.427 < 2e-16 ***
## regionColumbus -0.308254 0.019094 -16.144 < 2e-16 ***
## regionDallasFtWorth -0.475444 0.019094 -24.900 < 2e-16 ***
## regionDenver -0.342456 0.019094 -17.935 < 2e-16 ***
## regionDetroit -0.284941 0.019094 -14.923 < 2e-16 ***
## regionGrandRapids -0.056036 0.019094 -2.935 0.003342 **
## regionGreatLakes -0.222485 0.019094 -11.652 < 2e-16 ***
## regionHarrisburgScranton -0.047751 0.019094 -2.501 0.012397 *
## regionHartfordSpringfield 0.257604 0.019094 13.491 < 2e-16 ***
## regionHouston -0.513107 0.019094 -26.873 < 2e-16 ***
## regionIndianapolis -0.247041 0.019094 -12.938 < 2e-16 ***
## regionJacksonville -0.050089 0.019094 -2.623 0.008716 **
## regionLasVegas -0.180118 0.019094 -9.433 < 2e-16 ***
## regionLosAngeles -0.345030 0.019094 -18.070 < 2e-16 ***
## regionLouisville -0.274349 0.019094 -14.368 < 2e-16 ***
## regionMiamiFtLauderdale -0.132544 0.019094 -6.942 4.00e-12 ***
## regionMidsouth -0.156272 0.019094 -8.184 2.91e-16 ***
## regionNashville -0.348935 0.019094 -18.275 < 2e-16 ***
## regionNewOrleansMobile -0.256243 0.019094 -13.420 < 2e-16 ***
## regionNewYork 0.166538 0.019094 8.722 < 2e-16 ***
## regionNortheast 0.040888 0.019094 2.141 0.032255 *
## regionNorthernNewEngland -0.083639 0.019094 -4.380 1.19e-05 ***
## regionOrlando -0.054822 0.019094 -2.871 0.004094 **
## regionPhiladelphia 0.071095 0.019094 3.723 0.000197 ***
## regionPhoenixTucson -0.336598 0.019094 -17.629 < 2e-16 ***
## regionPittsburgh -0.196716 0.019094 -10.303 < 2e-16 ***
## regionPlains -0.124527 0.019094 -6.522 7.13e-11 ***
## regionPortland -0.243314 0.019094 -12.743 < 2e-16 ***
## regionRaleighGreensboro -0.005917 0.019094 -0.310 0.756641
## regionRichmondNorfolk -0.269704 0.019094 -14.125 < 2e-16 ***
## regionRoanoke -0.313107 0.019094 -16.398 < 2e-16 ***
## regionSacramento 0.060533 0.019094 3.170 0.001526 **
## regionSanDiego -0.162870 0.019094 -8.530 < 2e-16 ***
## regionSanFrancisco 0.243166 0.019094 12.735 < 2e-16 ***
## regionSeattle -0.118462 0.019094 -6.204 5.62e-10 ***
## regionSouthCarolina -0.157751 0.019094 -8.262 < 2e-16 ***
## regionSouthCentral -0.459793 0.019094 -24.081 < 2e-16 ***
## regionSoutheast -0.163018 0.019094 -8.538 < 2e-16 ***
## regionSpokane -0.115444 0.019094 -6.046 1.51e-09 ***
## regionStLouis -0.130414 0.019094 -6.830 8.75e-12 ***
## regionSyracuse -0.040710 0.019094 -2.132 0.033011 *
## regionTampa -0.152189 0.019094 -7.971 1.67e-15 ***
## regionTotalUS -0.242012 0.019094 -12.675 < 2e-16 ***
## regionWest -0.288817 0.019094 -15.126 < 2e-16 ***
## regionWestTexNewMexico -0.296624 0.019137 -15.500 < 2e-16 ***
## quarter2 0.081121 0.005410 14.996 < 2e-16 ***
## quarter3 0.218901 0.005409 40.471 < 2e-16 ***
## quarter4 0.161972 0.005376 30.130 < 2e-16 ***
## year2016 -0.036978 0.004684 -7.894 3.10e-15 ***
## year2017 0.138658 0.004663 29.735 < 2e-16 ***
## year2018 0.087412 0.008334 10.488 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2482 on 18188 degrees of freedom
## Multiple R-squared: 0.6213, Adjusted R-squared: 0.62
## F-statistic: 497.3 on 60 and 18188 DF, p-value: < 2.2e-16
Hmm, model4b with type, region, quarter and year wins here
We are likely now pursuing variables with rather limited explanatory power, but let’s check for one more main effect.
avocados_remaining_resid <- trimmed_avocados %>%
add_residuals(model4b) %>%
select(-c("average_price", "type", "region", "quarter", "year"))
ggpairs(avocados_remaining_resid)
ggsave("pairs_plot_choice5.png", width = 10, height = 10, units = "in")
It looks like x_large_bags is the remaining contender, let’s check it out!
model5 <- lm(average_price ~ type + region + quarter + year + x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5)
summary(model5)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## x_large_bags, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.03610 -0.14545 -0.00439 0.14420 1.43907
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.167e+00 1.429e-02 81.687 < 2e-16 ***
## typeorganic 4.982e-01 3.755e-03 132.674 < 2e-16 ***
## regionAtlanta -2.233e-01 1.909e-02 -11.698 < 2e-16 ***
## regionBaltimoreWashington -2.698e-02 1.909e-02 -1.413 0.157614
## regionBoise -2.129e-01 1.909e-02 -11.151 < 2e-16 ***
## regionBoston -3.019e-02 1.909e-02 -1.582 0.113769
## regionBuffaloRochester -4.424e-02 1.909e-02 -2.318 0.020485 *
## regionCalifornia -1.713e-01 1.919e-02 -8.925 < 2e-16 ***
## regionCharlotte 4.497e-02 1.909e-02 2.356 0.018493 *
## regionChicago -4.616e-03 1.909e-02 -0.242 0.808941
## regionCincinnatiDayton -3.521e-01 1.909e-02 -18.442 < 2e-16 ***
## regionColumbus -3.084e-01 1.909e-02 -16.157 < 2e-16 ***
## regionDallasFtWorth -4.759e-01 1.909e-02 -24.926 < 2e-16 ***
## regionDenver -3.425e-01 1.909e-02 -17.940 < 2e-16 ***
## regionDetroit -2.866e-01 1.910e-02 -15.008 < 2e-16 ***
## regionGrandRapids -5.688e-02 1.909e-02 -2.979 0.002894 **
## regionGreatLakes -2.292e-01 1.923e-02 -11.918 < 2e-16 ***
## regionHarrisburgScranton -4.787e-02 1.909e-02 -2.508 0.012166 *
## regionHartfordSpringfield 2.576e-01 1.909e-02 13.492 < 2e-16 ***
## regionHouston -5.134e-01 1.909e-02 -26.894 < 2e-16 ***
## regionIndianapolis -2.473e-01 1.909e-02 -12.954 < 2e-16 ***
## regionJacksonville -5.015e-02 1.909e-02 -2.627 0.008615 **
## regionLasVegas -1.801e-01 1.909e-02 -9.434 < 2e-16 ***
## regionLosAngeles -3.493e-01 1.915e-02 -18.243 < 2e-16 ***
## regionLouisville -2.744e-01 1.909e-02 -14.375 < 2e-16 ***
## regionMiamiFtLauderdale -1.328e-01 1.909e-02 -6.958 3.58e-12 ***
## regionMidsouth -1.577e-01 1.910e-02 -8.257 < 2e-16 ***
## regionNashville -3.490e-01 1.909e-02 -18.282 < 2e-16 ***
## regionNewOrleansMobile -2.567e-01 1.909e-02 -13.448 < 2e-16 ***
## regionNewYork 1.662e-01 1.909e-02 8.706 < 2e-16 ***
## regionNortheast 3.955e-02 1.910e-02 2.071 0.038381 *
## regionNorthernNewEngland -8.371e-02 1.909e-02 -4.385 1.17e-05 ***
## regionOrlando -5.503e-02 1.909e-02 -2.883 0.003945 **
## regionPhiladelphia 7.103e-02 1.909e-02 3.721 0.000199 ***
## regionPhoenixTucson -3.367e-01 1.909e-02 -17.638 < 2e-16 ***
## regionPittsburgh -1.967e-01 1.909e-02 -10.305 < 2e-16 ***
## regionPlains -1.257e-01 1.909e-02 -6.581 4.80e-11 ***
## regionPortland -2.434e-01 1.909e-02 -12.748 < 2e-16 ***
## regionRaleighGreensboro -5.972e-03 1.909e-02 -0.313 0.754415
## regionRichmondNorfolk -2.698e-01 1.909e-02 -14.132 < 2e-16 ***
## regionRoanoke -3.131e-01 1.909e-02 -16.404 < 2e-16 ***
## regionSacramento 6.036e-02 1.909e-02 3.162 0.001571 **
## regionSanDiego -1.630e-01 1.909e-02 -8.537 < 2e-16 ***
## regionSanFrancisco 2.430e-01 1.909e-02 12.728 < 2e-16 ***
## regionSeattle -1.185e-01 1.909e-02 -6.207 5.52e-10 ***
## regionSouthCarolina -1.579e-01 1.909e-02 -8.274 < 2e-16 ***
## regionSouthCentral -4.625e-01 1.911e-02 -24.199 < 2e-16 ***
## regionSoutheast -1.656e-01 1.911e-02 -8.667 < 2e-16 ***
## regionSpokane -1.154e-01 1.909e-02 -6.045 1.52e-09 ***
## regionStLouis -1.306e-01 1.909e-02 -6.842 8.08e-12 ***
## regionSyracuse -4.071e-02 1.909e-02 -2.132 0.032984 *
## regionTampa -1.524e-01 1.909e-02 -7.983 1.52e-15 ***
## regionTotalUS -2.647e-01 2.066e-02 -12.815 < 2e-16 ***
## regionWest -2.897e-01 1.909e-02 -15.171 < 2e-16 ***
## regionWestTexNewMexico -2.969e-01 1.913e-02 -15.518 < 2e-16 ***
## quarter2 8.058e-02 5.412e-03 14.891 < 2e-16 ***
## quarter3 2.181e-01 5.414e-03 40.293 < 2e-16 ***
## quarter4 1.621e-01 5.375e-03 30.154 < 2e-16 ***
## year2016 -3.791e-02 4.695e-03 -8.075 7.16e-16 ***
## year2017 1.375e-01 4.680e-03 29.381 < 2e-16 ***
## year2018 8.547e-02 8.360e-03 10.223 < 2e-16 ***
## x_large_bags 3.583e-07 1.246e-07 2.877 0.004025 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2482 on 18187 degrees of freedom
## Multiple R-squared: 0.6214, Adjusted R-squared: 0.6202
## F-statistic: 489.4 on 61 and 18187 DF, p-value: < 2.2e-16
It is a significant explanatory variable, so let’s keep it. Overall, we still have some heterscedasticity and deviations from normality in the residuals.
Let’s now think about possible pair interactions: for five main effect variables we have ten possible pair interactions. Let’s test them out.
model5pa <- lm(average_price ~ type + region + quarter + year + x_large_bags + type:region, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pa)
summary(model5pa)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## x_large_bags + type:region, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.00812 -0.13347 -0.00249 0.13359 1.48016
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.203e+00 1.855e-02 64.874 < 2e-16
## typeorganic 4.246e-01 2.558e-02 16.598 < 2e-16
## regionAtlanta -2.801e-01 2.558e-02 -10.950 < 2e-16
## regionBaltimoreWashington -4.684e-03 2.558e-02 -0.183 0.854724
## regionBoise -2.727e-01 2.558e-02 -10.660 < 2e-16
## regionBoston -4.441e-02 2.558e-02 -1.736 0.082557
## regionBuffaloRochester 3.352e-02 2.558e-02 1.310 0.190080
## regionCalifornia -2.474e-01 2.600e-02 -9.516 < 2e-16
## regionCharlotte -7.369e-02 2.558e-02 -2.881 0.003973
## regionChicago 2.033e-02 2.558e-02 0.795 0.426797
## regionCincinnatiDayton -3.334e-01 2.558e-02 -13.034 < 2e-16
## regionColumbus -2.826e-01 2.558e-02 -11.048 < 2e-16
## regionDallasFtWorth -5.026e-01 2.558e-02 -19.647 < 2e-16
## regionDenver -2.748e-01 2.558e-02 -10.743 < 2e-16
## regionDetroit -2.260e-01 2.562e-02 -8.823 < 2e-16
## regionGrandRapids -2.435e-02 2.559e-02 -0.951 0.341382
## regionGreatLakes -1.718e-01 2.619e-02 -6.560 5.54e-11
## regionHarrisburgScranton -9.003e-02 2.558e-02 -3.519 0.000434
## regionHartfordSpringfield 5.926e-02 2.558e-02 2.317 0.020528
## regionHouston -5.239e-01 2.558e-02 -20.479 < 2e-16
## regionIndianapolis -2.041e-01 2.558e-02 -7.978 1.57e-15
## regionJacksonville -1.552e-01 2.558e-02 -6.067 1.33e-09
## regionLasVegas -3.358e-01 2.558e-02 -13.126 < 2e-16
## regionLosAngeles -3.755e-01 2.583e-02 -14.536 < 2e-16
## regionLouisville -2.435e-01 2.558e-02 -9.518 < 2e-16
## regionMiamiFtLauderdale -9.464e-02 2.558e-02 -3.700 0.000217
## regionMidsouth -1.426e-01 2.561e-02 -5.570 2.58e-08
## regionNashville -3.359e-01 2.558e-02 -13.132 < 2e-16
## regionNewOrleansMobile -2.639e-01 2.558e-02 -10.313 < 2e-16
## regionNewYork 5.313e-02 2.558e-02 2.077 0.037842
## regionNortheast -5.307e-03 2.560e-02 -0.207 0.835817
## regionNorthernNewEngland -8.857e-02 2.558e-02 -3.463 0.000536
## regionOrlando -1.345e-01 2.558e-02 -5.257 1.48e-07
## regionPhiladelphia 4.753e-02 2.558e-02 1.858 0.063204
## regionPhoenixTucson -6.206e-01 2.558e-02 -24.261 < 2e-16
## regionPittsburgh -9.812e-02 2.558e-02 -3.836 0.000126
## regionPlains -1.841e-01 2.560e-02 -7.192 6.66e-13
## regionPortland -3.023e-01 2.558e-02 -11.817 < 2e-16
## regionRaleighGreensboro -1.217e-01 2.558e-02 -4.757 1.98e-06
## regionRichmondNorfolk -2.290e-01 2.558e-02 -8.952 < 2e-16
## regionRoanoke -2.528e-01 2.558e-02 -9.881 < 2e-16
## regionSacramento -7.492e-02 2.558e-02 -2.929 0.003407
## regionSanDiego -2.874e-01 2.558e-02 -11.233 < 2e-16
## regionSanFrancisco 4.827e-02 2.558e-02 1.887 0.059175
## regionSeattle -1.790e-01 2.558e-02 -6.998 2.69e-12
## regionSouthCarolina -2.027e-01 2.558e-02 -7.923 2.44e-15
## regionSouthCentral -4.814e-01 2.568e-02 -18.742 < 2e-16
## regionSoutheast -1.877e-01 2.567e-02 -7.310 2.79e-13
## regionSpokane -2.328e-01 2.558e-02 -9.099 < 2e-16
## regionStLouis -1.632e-01 2.558e-02 -6.378 1.84e-10
## regionSyracuse 3.817e-02 2.558e-02 1.492 0.135705
## regionTampa -1.473e-01 2.558e-02 -5.759 8.62e-09
## regionTotalUS -2.734e-01 3.186e-02 -8.583 < 2e-16
## regionWest -3.643e-01 2.559e-02 -14.235 < 2e-16
## regionWestTexNewMexico -5.068e-01 2.558e-02 -19.813 < 2e-16
## quarter2 8.101e-02 5.129e-03 15.793 < 2e-16
## quarter3 2.186e-01 5.134e-03 42.587 < 2e-16
## quarter4 1.620e-01 5.093e-03 31.820 < 2e-16
## year2016 -3.735e-02 4.455e-03 -8.385 < 2e-16
## year2017 1.383e-01 4.444e-03 31.110 < 2e-16
## year2018 8.670e-02 7.937e-03 10.923 < 2e-16
## x_large_bags 1.318e-07 1.499e-07 0.879 0.379416
## typeorganic:regionAtlanta 1.139e-01 3.618e-02 3.149 0.001642
## typeorganic:regionBaltimoreWashington -4.437e-02 3.618e-02 -1.226 0.220035
## typeorganic:regionBoise 1.196e-01 3.618e-02 3.307 0.000946
## typeorganic:regionBoston 2.849e-02 3.618e-02 0.788 0.430916
## typeorganic:regionBuffaloRochester -1.555e-01 3.618e-02 -4.298 1.74e-05
## typeorganic:regionCalifornia 1.593e-01 3.647e-02 4.367 1.27e-05
## typeorganic:regionCharlotte 2.374e-01 3.618e-02 6.561 5.48e-11
## typeorganic:regionChicago -4.944e-02 3.618e-02 -1.367 0.171744
## typeorganic:regionCincinnatiDayton -3.699e-02 3.618e-02 -1.022 0.306593
## typeorganic:regionColumbus -5.140e-02 3.618e-02 -1.421 0.155386
## typeorganic:regionDallasFtWorth 5.403e-02 3.618e-02 1.493 0.135327
## typeorganic:regionDenver -1.353e-01 3.618e-02 -3.741 0.000184
## typeorganic:regionDetroit -1.190e-01 3.620e-02 -3.288 0.001010
## typeorganic:regionGrandRapids -6.400e-02 3.618e-02 -1.769 0.076968
## typeorganic:regionGreatLakes -1.063e-01 3.661e-02 -2.903 0.003698
## typeorganic:regionHarrisburgScranton 8.447e-02 3.618e-02 2.335 0.019563
## typeorganic:regionHartfordSpringfield 3.967e-01 3.618e-02 10.965 < 2e-16
## typeorganic:regionHouston 2.134e-02 3.618e-02 0.590 0.555192
## typeorganic:regionIndianapolis -8.609e-02 3.618e-02 -2.380 0.017343
## typeorganic:regionJacksonville 2.102e-01 3.618e-02 5.810 6.37e-09
## typeorganic:regionLasVegas 3.113e-01 3.618e-02 8.606 < 2e-16
## typeorganic:regionLosAngeles 5.770e-02 3.635e-02 1.587 0.112476
## typeorganic:regionLouisville -6.178e-02 3.618e-02 -1.708 0.087678
## typeorganic:regionMiamiFtLauderdale -7.601e-02 3.618e-02 -2.101 0.035652
## typeorganic:regionMidsouth -2.831e-02 3.620e-02 -0.782 0.434169
## typeorganic:regionNashville -2.610e-02 3.618e-02 -0.721 0.470616
## typeorganic:regionNewOrleansMobile 1.486e-02 3.618e-02 0.411 0.681207
## typeorganic:regionNewYork 2.266e-01 3.618e-02 6.263 3.86e-10
## typeorganic:regionNortheast 9.140e-02 3.619e-02 2.525 0.011567
## typeorganic:regionNorthernNewEngland 9.816e-03 3.618e-02 0.271 0.786139
## typeorganic:regionOrlando 1.591e-01 3.618e-02 4.399 1.09e-05
## typeorganic:regionPhiladelphia 4.709e-02 3.618e-02 1.302 0.193037
## typeorganic:regionPhoenixTucson 5.680e-01 3.618e-02 15.700 < 2e-16
## typeorganic:regionPittsburgh -1.972e-01 3.618e-02 -5.451 5.06e-08
## typeorganic:regionPlains 1.183e-01 3.619e-02 3.269 0.001082
## typeorganic:regionPortland 1.179e-01 3.618e-02 3.259 0.001120
## typeorganic:regionRaleighGreensboro 2.315e-01 3.618e-02 6.400 1.59e-10
## typeorganic:regionRichmondNorfolk -8.148e-02 3.618e-02 -2.252 0.024322
## typeorganic:regionRoanoke -1.207e-01 3.618e-02 -3.338 0.000847
## typeorganic:regionSacramento 2.708e-01 3.618e-02 7.485 7.48e-14
## typeorganic:regionSanDiego 2.489e-01 3.618e-02 6.880 6.18e-12
## typeorganic:regionSanFrancisco 3.897e-01 3.618e-02 10.771 < 2e-16
## typeorganic:regionSeattle 1.211e-01 3.618e-02 3.347 0.000819
## typeorganic:regionSouthCarolina 8.973e-02 3.618e-02 2.480 0.013136
## typeorganic:regionSouthCentral 4.114e-02 3.625e-02 1.135 0.256458
## typeorganic:regionSoutheast 4.737e-02 3.624e-02 1.307 0.191198
## typeorganic:regionSpokane 2.346e-01 3.618e-02 6.486 9.03e-11
## typeorganic:regionStLouis 6.535e-02 3.618e-02 1.806 0.070875
## typeorganic:regionSyracuse -1.578e-01 3.618e-02 -4.361 1.30e-05
## typeorganic:regionTampa -9.910e-03 3.618e-02 -0.274 0.784145
## typeorganic:regionTotalUS 4.616e-02 4.086e-02 1.130 0.258597
## typeorganic:regionWest 1.503e-01 3.618e-02 4.154 3.28e-05
## typeorganic:regionWestTexNewMexico 4.234e-01 3.626e-02 11.676 < 2e-16
##
## (Intercept) ***
## typeorganic ***
## regionAtlanta ***
## regionBaltimoreWashington
## regionBoise ***
## regionBoston .
## regionBuffaloRochester
## regionCalifornia ***
## regionCharlotte **
## regionChicago
## regionCincinnatiDayton ***
## regionColumbus ***
## regionDallasFtWorth ***
## regionDenver ***
## regionDetroit ***
## regionGrandRapids
## regionGreatLakes ***
## regionHarrisburgScranton ***
## regionHartfordSpringfield *
## regionHouston ***
## regionIndianapolis ***
## regionJacksonville ***
## regionLasVegas ***
## regionLosAngeles ***
## regionLouisville ***
## regionMiamiFtLauderdale ***
## regionMidsouth ***
## regionNashville ***
## regionNewOrleansMobile ***
## regionNewYork *
## regionNortheast
## regionNorthernNewEngland ***
## regionOrlando ***
## regionPhiladelphia .
## regionPhoenixTucson ***
## regionPittsburgh ***
## regionPlains ***
## regionPortland ***
## regionRaleighGreensboro ***
## regionRichmondNorfolk ***
## regionRoanoke ***
## regionSacramento **
## regionSanDiego ***
## regionSanFrancisco .
## regionSeattle ***
## regionSouthCarolina ***
## regionSouthCentral ***
## regionSoutheast ***
## regionSpokane ***
## regionStLouis ***
## regionSyracuse
## regionTampa ***
## regionTotalUS ***
## regionWest ***
## regionWestTexNewMexico ***
## quarter2 ***
## quarter3 ***
## quarter4 ***
## year2016 ***
## year2017 ***
## year2018 ***
## x_large_bags
## typeorganic:regionAtlanta **
## typeorganic:regionBaltimoreWashington
## typeorganic:regionBoise ***
## typeorganic:regionBoston
## typeorganic:regionBuffaloRochester ***
## typeorganic:regionCalifornia ***
## typeorganic:regionCharlotte ***
## typeorganic:regionChicago
## typeorganic:regionCincinnatiDayton
## typeorganic:regionColumbus
## typeorganic:regionDallasFtWorth
## typeorganic:regionDenver ***
## typeorganic:regionDetroit **
## typeorganic:regionGrandRapids .
## typeorganic:regionGreatLakes **
## typeorganic:regionHarrisburgScranton *
## typeorganic:regionHartfordSpringfield ***
## typeorganic:regionHouston
## typeorganic:regionIndianapolis *
## typeorganic:regionJacksonville ***
## typeorganic:regionLasVegas ***
## typeorganic:regionLosAngeles
## typeorganic:regionLouisville .
## typeorganic:regionMiamiFtLauderdale *
## typeorganic:regionMidsouth
## typeorganic:regionNashville
## typeorganic:regionNewOrleansMobile
## typeorganic:regionNewYork ***
## typeorganic:regionNortheast *
## typeorganic:regionNorthernNewEngland
## typeorganic:regionOrlando ***
## typeorganic:regionPhiladelphia
## typeorganic:regionPhoenixTucson ***
## typeorganic:regionPittsburgh ***
## typeorganic:regionPlains **
## typeorganic:regionPortland **
## typeorganic:regionRaleighGreensboro ***
## typeorganic:regionRichmondNorfolk *
## typeorganic:regionRoanoke ***
## typeorganic:regionSacramento ***
## typeorganic:regionSanDiego ***
## typeorganic:regionSanFrancisco ***
## typeorganic:regionSeattle ***
## typeorganic:regionSouthCarolina *
## typeorganic:regionSouthCentral
## typeorganic:regionSoutheast
## typeorganic:regionSpokane ***
## typeorganic:regionStLouis .
## typeorganic:regionSyracuse ***
## typeorganic:regionTampa
## typeorganic:regionTotalUS
## typeorganic:regionWest ***
## typeorganic:regionWestTexNewMexico ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2351 on 18134 degrees of freedom
## Multiple R-squared: 0.6611, Adjusted R-squared: 0.659
## F-statistic: 310.3 on 114 and 18134 DF, p-value: < 2.2e-16
model5pb <- lm(average_price ~ type + region + quarter + year + x_large_bags + type:quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pb)
summary(model5pb)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## x_large_bags + type:quarter, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.02270 -0.14602 -0.00362 0.14398 1.44165
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.180e+00 1.454e-02 81.176 < 2e-16 ***
## typeorganic 4.717e-01 6.719e-03 70.203 < 2e-16 ***
## regionAtlanta -2.233e-01 1.907e-02 -11.713 < 2e-16 ***
## regionBaltimoreWashington -2.699e-02 1.907e-02 -1.416 0.156893
## regionBoise -2.129e-01 1.907e-02 -11.163 < 2e-16 ***
## regionBoston -3.020e-02 1.907e-02 -1.584 0.113308
## regionBuffaloRochester -4.425e-02 1.907e-02 -2.320 0.020331 *
## regionCalifornia -1.718e-01 1.917e-02 -8.962 < 2e-16 ***
## regionCharlotte 4.497e-02 1.907e-02 2.358 0.018367 *
## regionChicago -4.649e-03 1.907e-02 -0.244 0.807387
## regionCincinnatiDayton -3.521e-01 1.907e-02 -18.465 < 2e-16 ***
## regionColumbus -3.085e-01 1.907e-02 -16.177 < 2e-16 ***
## regionDallasFtWorth -4.759e-01 1.907e-02 -24.957 < 2e-16 ***
## regionDenver -3.425e-01 1.907e-02 -17.960 < 2e-16 ***
## regionDetroit -2.868e-01 1.908e-02 -15.034 < 2e-16 ***
## regionGrandRapids -5.696e-02 1.907e-02 -2.987 0.002824 **
## regionGreatLakes -2.298e-01 1.921e-02 -11.964 < 2e-16 ***
## regionHarrisburgScranton -4.788e-02 1.907e-02 -2.511 0.012048 *
## regionHartfordSpringfield 2.576e-01 1.907e-02 13.508 < 2e-16 ***
## regionHouston -5.134e-01 1.907e-02 -26.926 < 2e-16 ***
## regionIndianapolis -2.473e-01 1.907e-02 -12.970 < 2e-16 ***
## regionJacksonville -5.016e-02 1.907e-02 -2.631 0.008531 **
## regionLasVegas -1.801e-01 1.907e-02 -9.444 < 2e-16 ***
## regionLosAngeles -3.497e-01 1.913e-02 -18.284 < 2e-16 ***
## regionLouisville -2.744e-01 1.907e-02 -14.392 < 2e-16 ***
## regionMiamiFtLauderdale -1.328e-01 1.907e-02 -6.967 3.35e-12 ***
## regionMidsouth -1.578e-01 1.907e-02 -8.274 < 2e-16 ***
## regionNashville -3.490e-01 1.907e-02 -18.303 < 2e-16 ***
## regionNewOrleansMobile -2.568e-01 1.907e-02 -13.466 < 2e-16 ***
## regionNewYork 1.662e-01 1.907e-02 8.714 < 2e-16 ***
## regionNortheast 3.942e-02 1.907e-02 2.067 0.038772 *
## regionNorthernNewEngland -8.372e-02 1.907e-02 -4.390 1.14e-05 ***
## regionOrlando -5.505e-02 1.907e-02 -2.887 0.003892 **
## regionPhiladelphia 7.102e-02 1.907e-02 3.725 0.000196 ***
## regionPhoenixTucson -3.367e-01 1.907e-02 -17.659 < 2e-16 ***
## regionPittsburgh -1.967e-01 1.907e-02 -10.317 < 2e-16 ***
## regionPlains -1.258e-01 1.907e-02 -6.594 4.39e-11 ***
## regionPortland -2.434e-01 1.907e-02 -12.762 < 2e-16 ***
## regionRaleighGreensboro -5.977e-03 1.907e-02 -0.313 0.753941
## regionRichmondNorfolk -2.698e-01 1.907e-02 -14.149 < 2e-16 ***
## regionRoanoke -3.131e-01 1.907e-02 -16.423 < 2e-16 ***
## regionSacramento 6.034e-02 1.907e-02 3.164 0.001556 **
## regionSanDiego -1.630e-01 1.907e-02 -8.548 < 2e-16 ***
## regionSanFrancisco 2.430e-01 1.907e-02 12.742 < 2e-16 ***
## regionSeattle -1.185e-01 1.907e-02 -6.214 5.28e-10 ***
## regionSouthCarolina -1.580e-01 1.907e-02 -8.284 < 2e-16 ***
## regionSouthCentral -4.628e-01 1.909e-02 -24.240 < 2e-16 ***
## regionSoutheast -1.659e-01 1.909e-02 -8.690 < 2e-16 ***
## regionSpokane -1.154e-01 1.907e-02 -6.052 1.46e-09 ***
## regionStLouis -1.306e-01 1.907e-02 -6.850 7.60e-12 ***
## regionSyracuse -4.071e-02 1.907e-02 -2.135 0.032785 *
## regionTampa -1.524e-01 1.907e-02 -7.993 1.40e-15 ***
## regionTotalUS -2.668e-01 2.064e-02 -12.928 < 2e-16 ***
## regionWest -2.897e-01 1.907e-02 -15.193 < 2e-16 ***
## regionWestTexNewMexico -2.969e-01 1.911e-02 -15.537 < 2e-16 ***
## quarter2 6.536e-02 7.416e-03 8.814 < 2e-16 ***
## quarter3 1.848e-01 7.423e-03 24.898 < 2e-16 ***
## quarter4 1.530e-01 7.364e-03 20.776 < 2e-16 ***
## year2016 -3.800e-02 4.689e-03 -8.102 5.72e-16 ***
## year2017 1.374e-01 4.674e-03 29.392 < 2e-16 ***
## year2018 8.529e-02 8.351e-03 10.213 < 2e-16 ***
## x_large_bags 3.916e-07 1.246e-07 3.142 0.001682 **
## typeorganic:quarter2 3.034e-02 1.015e-02 2.989 0.002800 **
## typeorganic:quarter3 6.653e-02 1.015e-02 6.553 5.80e-11 ***
## typeorganic:quarter4 1.817e-02 1.008e-02 1.803 0.071446 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2479 on 18184 degrees of freedom
## Multiple R-squared: 0.6224, Adjusted R-squared: 0.621
## F-statistic: 468.3 on 64 and 18184 DF, p-value: < 2.2e-16
model5pc <- lm(average_price ~ type + region + quarter + year + x_large_bags + type:year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pc)
summary(model5pc)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## x_large_bags + type:year, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.00898 -0.14443 -0.00472 0.13873 1.46680
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.118e+00 1.442e-02 77.501 < 2e-16 ***
## typeorganic 5.956e-01 6.569e-03 90.667 < 2e-16 ***
## regionAtlanta -2.232e-01 1.892e-02 -11.796 < 2e-16 ***
## regionBaltimoreWashington -2.687e-02 1.892e-02 -1.420 0.155567
## regionBoise -2.129e-01 1.892e-02 -11.252 < 2e-16 ***
## regionBoston -3.016e-02 1.892e-02 -1.594 0.110873
## regionBuffaloRochester -4.422e-02 1.892e-02 -2.337 0.019445 *
## regionCalifornia -1.678e-01 1.902e-02 -8.823 < 2e-16 ***
## regionCharlotte 4.499e-02 1.892e-02 2.378 0.017419 *
## regionChicago -4.393e-03 1.892e-02 -0.232 0.816388
## regionCincinnatiDayton -3.519e-01 1.892e-02 -18.601 < 2e-16 ***
## regionColumbus -3.083e-01 1.892e-02 -16.297 < 2e-16 ***
## regionDallasFtWorth -4.756e-01 1.892e-02 -25.137 < 2e-16 ***
## regionDenver -3.425e-01 1.892e-02 -18.101 < 2e-16 ***
## regionDetroit -2.856e-01 1.893e-02 -15.087 < 2e-16 ***
## regionGrandRapids -5.635e-02 1.892e-02 -2.978 0.002904 **
## regionGreatLakes -2.250e-01 1.906e-02 -11.803 < 2e-16 ***
## regionHarrisburgScranton -4.780e-02 1.892e-02 -2.526 0.011537 *
## regionHartfordSpringfield 2.576e-01 1.892e-02 13.615 < 2e-16 ***
## regionHouston -5.132e-01 1.892e-02 -27.126 < 2e-16 ***
## regionIndianapolis -2.471e-01 1.892e-02 -13.062 < 2e-16 ***
## regionJacksonville -5.011e-02 1.892e-02 -2.649 0.008085 **
## regionLasVegas -1.801e-01 1.892e-02 -9.520 < 2e-16 ***
## regionLosAngeles -3.466e-01 1.898e-02 -18.265 < 2e-16 ***
## regionLouisville -2.744e-01 1.892e-02 -14.502 < 2e-16 ***
## regionMiamiFtLauderdale -1.326e-01 1.892e-02 -7.011 2.45e-12 ***
## regionMidsouth -1.568e-01 1.893e-02 -8.285 < 2e-16 ***
## regionNashville -3.490e-01 1.892e-02 -18.445 < 2e-16 ***
## regionNewOrleansMobile -2.564e-01 1.892e-02 -13.553 < 2e-16 ***
## regionNewYork 1.664e-01 1.892e-02 8.796 < 2e-16 ***
## regionNortheast 4.039e-02 1.893e-02 2.134 0.032855 *
## regionNorthernNewEngland -8.367e-02 1.892e-02 -4.422 9.83e-06 ***
## regionOrlando -5.490e-02 1.892e-02 -2.902 0.003714 **
## regionPhiladelphia 7.107e-02 1.892e-02 3.756 0.000173 ***
## regionPhoenixTucson -3.366e-01 1.892e-02 -17.793 < 2e-16 ***
## regionPittsburgh -1.967e-01 1.892e-02 -10.398 < 2e-16 ***
## regionPlains -1.249e-01 1.892e-02 -6.603 4.14e-11 ***
## regionPortland -2.433e-01 1.892e-02 -12.861 < 2e-16 ***
## regionRaleighGreensboro -5.938e-03 1.892e-02 -0.314 0.753649
## regionRichmondNorfolk -2.697e-01 1.892e-02 -14.257 < 2e-16 ***
## regionRoanoke -3.131e-01 1.892e-02 -16.550 < 2e-16 ***
## regionSacramento 6.047e-02 1.892e-02 3.196 0.001396 **
## regionSanDiego -1.629e-01 1.892e-02 -8.611 < 2e-16 ***
## regionSanFrancisco 2.431e-01 1.892e-02 12.849 < 2e-16 ***
## regionSeattle -1.185e-01 1.892e-02 -6.262 3.89e-10 ***
## regionSouthCarolina -1.578e-01 1.892e-02 -8.342 < 2e-16 ***
## regionSouthCentral -4.608e-01 1.894e-02 -24.326 < 2e-16 ***
## regionSoutheast -1.640e-01 1.894e-02 -8.658 < 2e-16 ***
## regionSpokane -1.154e-01 1.892e-02 -6.101 1.07e-09 ***
## regionStLouis -1.305e-01 1.892e-02 -6.897 5.49e-12 ***
## regionSyracuse -4.071e-02 1.892e-02 -2.152 0.031432 *
## regionTampa -1.523e-01 1.892e-02 -8.048 8.93e-16 ***
## regionTotalUS -2.505e-01 2.049e-02 -12.226 < 2e-16 ***
## regionWest -2.891e-01 1.892e-02 -15.280 < 2e-16 ***
## regionWestTexNewMexico -2.967e-01 1.896e-02 -15.650 < 2e-16 ***
## quarter2 8.091e-02 5.363e-03 15.085 < 2e-16 ***
## quarter3 2.186e-01 5.366e-03 40.744 < 2e-16 ***
## quarter4 1.620e-01 5.327e-03 30.417 < 2e-16 ***
## year2016 2.694e-02 6.596e-03 4.084 4.45e-05 ***
## year2017 2.152e-01 6.582e-03 32.691 < 2e-16 ***
## year2018 1.641e-01 1.128e-02 14.549 < 2e-16 ***
## x_large_bags 1.338e-07 1.241e-07 1.078 0.281087
## typeorganic:year2016 -1.285e-01 9.306e-03 -13.813 < 2e-16 ***
## typeorganic:year2017 -1.540e-01 9.275e-03 -16.600 < 2e-16 ***
## typeorganic:year2018 -1.548e-01 1.520e-02 -10.184 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.246 on 18184 degrees of freedom
## Multiple R-squared: 0.6282, Adjusted R-squared: 0.6269
## F-statistic: 480.1 on 64 and 18184 DF, p-value: < 2.2e-16
model5pd <- lm(average_price ~ type + region + quarter + year + x_large_bags + type:x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pd)
summary(model5pd)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## x_large_bags + type:x_large_bags, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.03574 -0.14591 -0.00478 0.14434 1.43935
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.168e+00 1.429e-02 81.734 < 2e-16 ***
## typeorganic 4.978e-01 3.757e-03 132.483 < 2e-16 ***
## regionAtlanta -2.233e-01 1.909e-02 -11.701 < 2e-16 ***
## regionBaltimoreWashington -2.699e-02 1.909e-02 -1.414 0.157339
## regionBoise -2.130e-01 1.909e-02 -11.159 < 2e-16 ***
## regionBoston -3.020e-02 1.909e-02 -1.582 0.113671
## regionBuffaloRochester -4.425e-02 1.909e-02 -2.318 0.020456 *
## regionCalifornia -1.717e-01 1.918e-02 -8.949 < 2e-16 ***
## regionCharlotte 4.497e-02 1.909e-02 2.356 0.018481 *
## regionChicago -4.644e-03 1.909e-02 -0.243 0.807777
## regionCincinnatiDayton -3.521e-01 1.909e-02 -18.446 < 2e-16 ***
## regionColumbus -3.085e-01 1.909e-02 -16.160 < 2e-16 ***
## regionDallasFtWorth -4.759e-01 1.909e-02 -24.932 < 2e-16 ***
## regionDenver -3.425e-01 1.909e-02 -17.943 < 2e-16 ***
## regionDetroit -2.868e-01 1.910e-02 -15.017 < 2e-16 ***
## regionGrandRapids -5.695e-02 1.909e-02 -2.983 0.002857 **
## regionGreatLakes -2.297e-01 1.923e-02 -11.947 < 2e-16 ***
## regionHarrisburgScranton -4.788e-02 1.909e-02 -2.508 0.012135 *
## regionHartfordSpringfield 2.576e-01 1.909e-02 13.494 < 2e-16 ***
## regionHouston -5.134e-01 1.909e-02 -26.899 < 2e-16 ***
## regionIndianapolis -2.473e-01 1.909e-02 -12.957 < 2e-16 ***
## regionJacksonville -5.016e-02 1.909e-02 -2.628 0.008598 **
## regionLasVegas -1.801e-01 1.909e-02 -9.435 < 2e-16 ***
## regionLosAngeles -3.496e-01 1.915e-02 -18.263 < 2e-16 ***
## regionLouisville -2.744e-01 1.909e-02 -14.377 < 2e-16 ***
## regionMiamiFtLauderdale -1.328e-01 1.909e-02 -6.960 3.52e-12 ***
## regionMidsouth -1.578e-01 1.909e-02 -8.265 < 2e-16 ***
## regionNashville -3.490e-01 1.909e-02 -18.285 < 2e-16 ***
## regionNewOrleansMobile -2.568e-01 1.909e-02 -13.453 < 2e-16 ***
## regionNewYork 1.662e-01 1.909e-02 8.706 < 2e-16 ***
## regionNortheast 3.944e-02 1.909e-02 2.066 0.038871 *
## regionNorthernNewEngland -8.372e-02 1.909e-02 -4.386 1.16e-05 ***
## regionOrlando -5.505e-02 1.909e-02 -2.884 0.003929 **
## regionPhiladelphia 7.102e-02 1.909e-02 3.721 0.000199 ***
## regionPhoenixTucson -3.367e-01 1.909e-02 -17.642 < 2e-16 ***
## regionPittsburgh -1.967e-01 1.909e-02 -10.307 < 2e-16 ***
## regionPlains -1.258e-01 1.909e-02 -6.589 4.56e-11 ***
## regionPortland -2.447e-01 1.909e-02 -12.817 < 2e-16 ***
## regionRaleighGreensboro -5.976e-03 1.909e-02 -0.313 0.754207
## regionRichmondNorfolk -2.698e-01 1.909e-02 -14.135 < 2e-16 ***
## regionRoanoke -3.131e-01 1.909e-02 -16.406 < 2e-16 ***
## regionSacramento 6.034e-02 1.909e-02 3.161 0.001572 **
## regionSanDiego -1.630e-01 1.909e-02 -8.539 < 2e-16 ***
## regionSanFrancisco 2.430e-01 1.909e-02 12.730 < 2e-16 ***
## regionSeattle -1.212e-01 1.912e-02 -6.341 2.34e-10 ***
## regionSouthCarolina -1.580e-01 1.909e-02 -8.276 < 2e-16 ***
## regionSouthCentral -4.628e-01 1.911e-02 -24.214 < 2e-16 ***
## regionSoutheast -1.658e-01 1.911e-02 -8.679 < 2e-16 ***
## regionSpokane -1.156e-01 1.909e-02 -6.056 1.42e-09 ***
## regionStLouis -1.306e-01 1.909e-02 -6.843 7.98e-12 ***
## regionSyracuse -4.071e-02 1.909e-02 -2.133 0.032957 *
## regionTampa -1.524e-01 1.909e-02 -7.985 1.49e-15 ***
## regionTotalUS -2.719e-01 2.084e-02 -13.048 < 2e-16 ***
## regionWest -2.951e-01 1.920e-02 -15.366 < 2e-16 ***
## regionWestTexNewMexico -2.970e-01 1.913e-02 -15.524 < 2e-16 ***
## quarter2 8.054e-02 5.411e-03 14.885 < 2e-16 ***
## quarter3 2.180e-01 5.414e-03 40.259 < 2e-16 ***
## quarter4 1.616e-01 5.377e-03 30.058 < 2e-16 ***
## year2016 -3.798e-02 4.694e-03 -8.092 6.25e-16 ***
## year2017 1.370e-01 4.684e-03 29.241 < 2e-16 ***
## year2018 8.319e-02 8.405e-03 9.898 < 2e-16 ***
## x_large_bags 3.865e-07 1.250e-07 3.091 0.001995 **
## typeorganic:x_large_bags 4.737e-04 1.827e-04 2.593 0.009522 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2481 on 18186 degrees of freedom
## Multiple R-squared: 0.6216, Adjusted R-squared: 0.6203
## F-statistic: 481.8 on 62 and 18186 DF, p-value: < 2.2e-16
model5pe <- lm(average_price ~ type + region + quarter + year + x_large_bags + region:quarter, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pe)
summary(model5pe)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## x_large_bags + region:quarter, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.06468 -0.14582 0.00048 0.14087 1.38018
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.216e+00 2.423e-02 50.190 < 2e-16 ***
## typeorganic 4.985e-01 3.663e-03 136.095 < 2e-16 ***
## regionAtlanta -2.579e-01 3.388e-02 -7.611 2.85e-14 ***
## regionBaltimoreWashington -8.986e-02 3.388e-02 -2.652 0.008000 **
## regionBoise -2.854e-01 3.388e-02 -8.424 < 2e-16 ***
## regionBoston -7.093e-03 3.388e-02 -0.209 0.834158
## regionBuffaloRochester -3.109e-02 3.388e-02 -0.918 0.358774
## regionCalifornia -2.868e-01 3.394e-02 -8.450 < 2e-16 ***
## regionCharlotte -2.147e-02 3.388e-02 -0.634 0.526347
## regionChicago -7.400e-02 3.388e-02 -2.184 0.028945 *
## regionCincinnatiDayton -4.353e-01 3.388e-02 -12.849 < 2e-16 ***
## regionColumbus -3.251e-01 3.388e-02 -9.595 < 2e-16 ***
## regionDallasFtWorth -4.853e-01 3.388e-02 -14.325 < 2e-16 ***
## regionDenver -4.216e-01 3.388e-02 -12.443 < 2e-16 ***
## regionDetroit -3.074e-01 3.389e-02 -9.071 < 2e-16 ***
## regionGrandRapids -1.295e-01 3.388e-02 -3.824 0.000132 ***
## regionGreatLakes -2.769e-01 3.398e-02 -8.150 3.88e-16 ***
## regionHarrisburgScranton -6.005e-02 3.388e-02 -1.773 0.076319 .
## regionHartfordSpringfield 2.290e-01 3.388e-02 6.759 1.43e-11 ***
## regionHouston -5.375e-01 3.388e-02 -15.867 < 2e-16 ***
## regionIndianapolis -2.742e-01 3.388e-02 -8.093 6.20e-16 ***
## regionJacksonville -1.104e-01 3.388e-02 -3.259 0.001121 **
## regionLasVegas -2.907e-01 3.388e-02 -8.581 < 2e-16 ***
## regionLosAngeles -4.383e-01 3.391e-02 -12.923 < 2e-16 ***
## regionLouisville -2.956e-01 3.388e-02 -8.725 < 2e-16 ***
## regionMiamiFtLauderdale -1.119e-01 3.388e-02 -3.302 0.000962 ***
## regionMidsouth -1.953e-01 3.388e-02 -5.764 8.33e-09 ***
## regionNashville -3.514e-01 3.388e-02 -10.372 < 2e-16 ***
## regionNewOrleansMobile -3.177e-01 3.388e-02 -9.377 < 2e-16 ***
## regionNewYork 1.048e-01 3.388e-02 3.094 0.001979 **
## regionNortheast 1.933e-02 3.388e-02 0.570 0.568361
## regionNorthernNewEngland -5.982e-02 3.388e-02 -1.766 0.077455 .
## regionOrlando -1.034e-01 3.388e-02 -3.053 0.002269 **
## regionPhiladelphia 1.650e-02 3.388e-02 0.487 0.626253
## regionPhoenixTucson -4.456e-01 3.388e-02 -13.153 < 2e-16 ***
## regionPittsburgh -1.745e-01 3.388e-02 -5.151 2.62e-07 ***
## regionPlains -1.852e-01 3.388e-02 -5.466 4.67e-08 ***
## regionPortland -3.533e-01 3.388e-02 -10.429 < 2e-16 ***
## regionRaleighGreensboro -5.803e-02 3.388e-02 -1.713 0.086751 .
## regionRichmondNorfolk -2.636e-01 3.388e-02 -7.782 7.52e-15 ***
## regionRoanoke -3.123e-01 3.388e-02 -9.217 < 2e-16 ***
## regionSacramento -2.741e-02 3.388e-02 -0.809 0.418472
## regionSanDiego -2.868e-01 3.388e-02 -8.466 < 2e-16 ***
## regionSanFrancisco 9.026e-02 3.388e-02 2.664 0.007726 **
## regionSeattle -2.589e-01 3.388e-02 -7.642 2.25e-14 ***
## regionSouthCarolina -2.071e-01 3.388e-02 -6.114 9.93e-10 ***
## regionSouthCentral -4.798e-01 3.390e-02 -14.153 < 2e-16 ***
## regionSoutheast -2.084e-01 3.388e-02 -6.151 7.88e-10 ***
## regionSpokane -2.696e-01 3.388e-02 -7.958 1.85e-15 ***
## regionStLouis -1.910e-01 3.388e-02 -5.639 1.74e-08 ***
## regionSyracuse -2.764e-02 3.388e-02 -0.816 0.414661
## regionTampa -1.532e-01 3.388e-02 -4.523 6.14e-06 ***
## regionTotalUS -3.151e-01 3.466e-02 -9.091 < 2e-16 ***
## regionWest -3.903e-01 3.388e-02 -11.520 < 2e-16 ***
## regionWestTexNewMexico -3.665e-01 3.388e-02 -10.818 < 2e-16 ***
## quarter2 8.528e-02 3.644e-02 2.341 0.019266 *
## quarter3 9.278e-02 3.644e-02 2.546 0.010895 *
## quarter4 7.165e-02 3.618e-02 1.981 0.047660 *
## year2016 -3.808e-02 4.577e-03 -8.319 < 2e-16 ***
## year2017 1.373e-01 4.563e-03 30.081 < 2e-16 ***
## year2018 8.513e-02 8.151e-03 10.444 < 2e-16 ***
## x_large_bags 4.158e-07 1.233e-07 3.373 0.000746 ***
## regionAtlanta:quarter2 -8.875e-02 5.147e-02 -1.725 0.084627 .
## regionBaltimoreWashington:quarter2 9.216e-02 5.147e-02 1.791 0.073359 .
## regionBoise:quarter2 -9.544e-02 5.147e-02 -1.854 0.063692 .
## regionBoston:quarter2 1.139e-02 5.147e-02 0.221 0.824911
## regionBuffaloRochester:quarter2 8.166e-02 5.147e-02 1.587 0.112579
## regionCalifornia:quarter2 4.240e-03 5.147e-02 0.082 0.934345
## regionCharlotte:quarter2 6.218e-02 5.147e-02 1.208 0.226952
## regionChicago:quarter2 -4.249e-03 5.147e-02 -0.083 0.934198
## regionCincinnatiDayton:quarter2 1.014e-02 5.147e-02 0.197 0.843877
## regionColumbus:quarter2 -9.402e-02 5.147e-02 -1.827 0.067727 .
## regionDallasFtWorth:quarter2 -7.789e-02 5.147e-02 -1.513 0.130177
## regionDenver:quarter2 -1.578e-02 5.147e-02 -0.307 0.759141
## regionDetroit:quarter2 -3.691e-02 5.147e-02 -0.717 0.473257
## regionGrandRapids:quarter2 1.363e-01 5.147e-02 2.649 0.008086 **
## regionGreatLakes:quarter2 -1.091e-02 5.147e-02 -0.212 0.832191
## regionHarrisburgScranton:quarter2 6.543e-02 5.147e-02 1.271 0.203625
## regionHartfordSpringfield:quarter2 6.725e-02 5.147e-02 1.307 0.191332
## regionHouston:quarter2 -8.920e-02 5.147e-02 -1.733 0.083088 .
## regionIndianapolis:quarter2 -6.425e-02 5.147e-02 -1.248 0.211928
## regionJacksonville:quarter2 2.811e-02 5.147e-02 0.546 0.584928
## regionLasVegas:quarter2 -7.424e-02 5.147e-02 -1.443 0.149173
## regionLosAngeles:quarter2 -6.060e-02 5.147e-02 -1.177 0.239049
## regionLouisville:quarter2 -7.449e-02 5.147e-02 -1.447 0.147834
## regionMiamiFtLauderdale:quarter2 -1.020e-02 5.147e-02 -0.198 0.842828
## regionMidsouth:quarter2 -1.515e-02 5.147e-02 -0.294 0.768501
## regionNashville:quarter2 -1.026e-01 5.147e-02 -1.993 0.046304 *
## regionNewOrleansMobile:quarter2 8.341e-02 5.147e-02 1.621 0.105105
## regionNewYork:quarter2 8.732e-02 5.147e-02 1.697 0.089772 .
## regionNortheast:quarter2 5.500e-02 5.147e-02 1.069 0.285265
## regionNorthernNewEngland:quarter2 -6.770e-02 5.147e-02 -1.316 0.188354
## regionOrlando:quarter2 1.769e-02 5.147e-02 0.344 0.731089
## regionPhiladelphia:quarter2 1.100e-01 5.147e-02 2.137 0.032587 *
## regionPhoenixTucson:quarter2 -1.980e-02 5.147e-02 -0.385 0.700459
## regionPittsburgh:quarter2 -3.807e-02 5.147e-02 -0.740 0.459513
## regionPlains:quarter2 -4.009e-03 5.147e-02 -0.078 0.937911
## regionPortland:quarter2 -4.527e-02 5.147e-02 -0.880 0.379084
## regionRaleighGreensboro:quarter2 1.832e-03 5.147e-02 0.036 0.971604
## regionRichmondNorfolk:quarter2 -1.137e-01 5.147e-02 -2.209 0.027195 *
## regionRoanoke:quarter2 -1.312e-01 5.147e-02 -2.550 0.010779 *
## regionSacramento:quarter2 8.446e-02 5.147e-02 1.641 0.100786
## regionSanDiego:quarter2 -3.285e-03 5.147e-02 -0.064 0.949106
## regionSanFrancisco:quarter2 1.221e-01 5.147e-02 2.373 0.017637 *
## regionSeattle:quarter2 1.210e-02 5.147e-02 0.235 0.814101
## regionSouthCarolina:quarter2 2.735e-02 5.147e-02 0.531 0.595172
## regionSouthCentral:quarter2 -7.164e-02 5.147e-02 -1.392 0.163922
## regionSoutheast:quarter2 -9.837e-03 5.148e-02 -0.191 0.848456
## regionSpokane:quarter2 9.803e-03 5.147e-02 0.190 0.848939
## regionStLouis:quarter2 5.672e-02 5.147e-02 1.102 0.270444
## regionSyracuse:quarter2 6.494e-02 5.147e-02 1.262 0.207015
## regionTampa:quarter2 5.706e-03 5.147e-02 0.111 0.911722
## regionTotalUS:quarter2 -1.476e-02 5.149e-02 -0.287 0.774329
## regionWest:quarter2 -2.856e-02 5.147e-02 -0.555 0.578953
## regionWestTexNewMexico:quarter2 -9.603e-02 5.166e-02 -1.859 0.063053 .
## regionAtlanta:quarter3 1.224e-01 5.147e-02 2.378 0.017422 *
## regionBaltimoreWashington:quarter3 9.538e-02 5.147e-02 1.853 0.063854 .
## regionBoise:quarter3 2.521e-01 5.147e-02 4.898 9.79e-07 ***
## regionBoston:quarter3 -1.212e-03 5.147e-02 -0.024 0.981214
## regionBuffaloRochester:quarter3 -3.416e-02 5.147e-02 -0.664 0.506909
## regionCalifornia:quarter3 2.572e-01 5.147e-02 4.996 5.89e-07 ***
## regionCharlotte:quarter3 1.397e-01 5.147e-02 2.715 0.006641 **
## regionChicago:quarter3 1.740e-01 5.147e-02 3.381 0.000723 ***
## regionCincinnatiDayton:quarter3 2.128e-01 5.147e-02 4.135 3.57e-05 ***
## regionColumbus:quarter3 1.094e-01 5.147e-02 2.126 0.033525 *
## regionDallasFtWorth:quarter3 2.363e-02 5.147e-02 0.459 0.646184
## regionDenver:quarter3 2.124e-01 5.147e-02 4.128 3.68e-05 ***
## regionDetroit:quarter3 5.517e-02 5.147e-02 1.072 0.283742
## regionGrandRapids:quarter3 9.166e-02 5.147e-02 1.781 0.074936 .
## regionGreatLakes:quarter3 1.228e-01 5.147e-02 2.387 0.017003 *
## regionHarrisburgScranton:quarter3 6.457e-03 5.147e-02 0.125 0.900153
## regionHartfordSpringfield:quarter3 4.942e-02 5.147e-02 0.960 0.336930
## regionHouston:quarter3 7.247e-02 5.147e-02 1.408 0.159093
## regionIndianapolis:quarter3 9.223e-02 5.147e-02 1.792 0.073138 .
## regionJacksonville:quarter3 1.680e-01 5.147e-02 3.265 0.001098 **
## regionLasVegas:quarter3 2.954e-01 5.147e-02 5.740 9.61e-09 ***
## regionLosAngeles:quarter3 2.150e-01 5.147e-02 4.178 2.96e-05 ***
## regionLouisville:quarter3 8.478e-02 5.147e-02 1.647 0.099505 .
## regionMiamiFtLauderdale:quarter3 -7.307e-02 5.147e-02 -1.420 0.155672
## regionMidsouth:quarter3 9.249e-02 5.147e-02 1.797 0.072360 .
## regionNashville:quarter3 4.167e-02 5.147e-02 0.810 0.418085
## regionNewOrleansMobile:quarter3 7.109e-02 5.147e-02 1.381 0.167222
## regionNewYork:quarter3 1.121e-01 5.147e-02 2.177 0.029476 *
## regionNortheast:quarter3 4.725e-02 5.147e-02 0.918 0.358649
## regionNorthernNewEngland:quarter3 -1.389e-02 5.147e-02 -0.270 0.787273
## regionOrlando:quarter3 1.156e-01 5.147e-02 2.245 0.024762 *
## regionPhiladelphia:quarter3 8.202e-02 5.147e-02 1.594 0.111012
## regionPhoenixTucson:quarter3 2.603e-01 5.147e-02 5.058 4.27e-07 ***
## regionPittsburgh:quarter3 -1.622e-02 5.147e-02 -0.315 0.752619
## regionPlains:quarter3 1.348e-01 5.147e-02 2.619 0.008837 **
## regionPortland:quarter3 3.344e-01 5.147e-02 6.498 8.33e-11 ***
## regionRaleighGreensboro:quarter3 1.211e-01 5.147e-02 2.354 0.018600 *
## regionRichmondNorfolk:quarter3 5.134e-02 5.147e-02 0.998 0.318528
## regionRoanoke:quarter3 9.037e-02 5.147e-02 1.756 0.079127 .
## regionSacramento:quarter3 1.815e-01 5.147e-02 3.527 0.000421 ***
## regionSanDiego:quarter3 2.805e-01 5.147e-02 5.451 5.08e-08 ***
## regionSanFrancisco:quarter3 3.126e-01 5.147e-02 6.074 1.27e-09 ***
## regionSeattle:quarter3 3.922e-01 5.147e-02 7.620 2.66e-14 ***
## regionSouthCarolina:quarter3 1.023e-01 5.147e-02 1.987 0.046905 *
## regionSouthCentral:quarter3 4.390e-02 5.147e-02 0.853 0.393732
## regionSoutheast:quarter3 1.067e-01 5.148e-02 2.073 0.038179 *
## regionSpokane:quarter3 3.937e-01 5.147e-02 7.650 2.11e-14 ***
## regionStLouis:quarter3 1.916e-01 5.147e-02 3.723 0.000197 ***
## regionSyracuse:quarter3 -3.686e-02 5.147e-02 -0.716 0.473930
## regionTampa:quarter3 -4.372e-02 5.147e-02 -0.850 0.395566
## regionTotalUS:quarter3 9.405e-02 5.156e-02 1.824 0.068183 .
## regionWest:quarter3 2.980e-01 5.147e-02 5.791 7.13e-09 ***
## regionWestTexNewMexico:quarter3 1.785e-01 5.147e-02 3.469 0.000523 ***
## regionAtlanta:quarter4 1.130e-01 5.110e-02 2.210 0.027086 *
## regionBaltimoreWashington:quarter4 8.270e-02 5.110e-02 1.618 0.105581
## regionBoise:quarter4 1.538e-01 5.110e-02 3.009 0.002626 **
## regionBoston:quarter4 -1.075e-01 5.110e-02 -2.105 0.035345 *
## regionBuffaloRochester:quarter4 -1.019e-01 5.110e-02 -1.994 0.046129 *
## regionCalifornia:quarter4 2.298e-01 5.110e-02 4.496 6.96e-06 ***
## regionCharlotte:quarter4 8.383e-02 5.110e-02 1.641 0.100897
## regionChicago:quarter4 1.274e-01 5.110e-02 2.493 0.012665 *
## regionCincinnatiDayton:quarter4 1.341e-01 5.110e-02 2.625 0.008682 **
## regionColumbus:quarter4 5.517e-02 5.110e-02 1.080 0.280283
## regionDallasFtWorth:quarter4 9.257e-02 5.110e-02 1.811 0.070085 .
## regionDenver:quarter4 1.424e-01 5.110e-02 2.787 0.005319 **
## regionDetroit:quarter4 6.867e-02 5.110e-02 1.344 0.179058
## regionGrandRapids:quarter4 8.416e-02 5.110e-02 1.647 0.099577 .
## regionGreatLakes:quarter4 8.786e-02 5.111e-02 1.719 0.085637 .
## regionHarrisburgScranton:quarter4 -1.870e-02 5.110e-02 -0.366 0.714421
## regionHartfordSpringfield:quarter4 7.018e-03 5.110e-02 0.137 0.890763
## regionHouston:quarter4 1.181e-01 5.110e-02 2.311 0.020852 *
## regionIndianapolis:quarter4 8.610e-02 5.110e-02 1.685 0.092029 .
## regionJacksonville:quarter4 6.326e-02 5.110e-02 1.238 0.215724
## regionLasVegas:quarter4 2.517e-01 5.110e-02 4.925 8.50e-07 ***
## regionLosAngeles:quarter4 2.225e-01 5.110e-02 4.354 1.34e-05 ***
## regionLouisville:quarter4 7.942e-02 5.110e-02 1.554 0.120131
## regionMiamiFtLauderdale:quarter4 -7.519e-03 5.110e-02 -0.147 0.883012
## regionMidsouth:quarter4 8.252e-02 5.110e-02 1.615 0.106367
## regionNashville:quarter4 6.942e-02 5.110e-02 1.358 0.174327
## regionNewOrleansMobile:quarter4 1.065e-01 5.110e-02 2.085 0.037083 *
## regionNewYork:quarter4 6.475e-02 5.110e-02 1.267 0.205122
## regionNortheast:quarter4 -1.518e-02 5.110e-02 -0.297 0.766432
## regionNorthernNewEngland:quarter4 -2.143e-02 5.110e-02 -0.419 0.674965
## regionOrlando:quarter4 7.442e-02 5.110e-02 1.456 0.145298
## regionPhiladelphia:quarter4 4.312e-02 5.110e-02 0.844 0.398758
## regionPhoenixTucson:quarter4 2.254e-01 5.110e-02 4.412 1.03e-05 ***
## regionPittsburgh:quarter4 -4.100e-02 5.110e-02 -0.802 0.422337
## regionPlains:quarter4 1.232e-01 5.110e-02 2.411 0.015911 *
## regionPortland:quarter4 1.827e-01 5.110e-02 3.575 0.000351 ***
## regionRaleighGreensboro:quarter4 1.000e-01 5.110e-02 1.957 0.050323 .
## regionRichmondNorfolk:quarter4 3.477e-02 5.110e-02 0.680 0.496243
## regionRoanoke:quarter4 3.614e-02 5.110e-02 0.707 0.479434
## regionSacramento:quarter4 1.114e-01 5.110e-02 2.179 0.029314 *
## regionSanDiego:quarter4 2.529e-01 5.110e-02 4.949 7.54e-07 ***
## regionSanFrancisco:quarter4 2.213e-01 5.110e-02 4.331 1.49e-05 ***
## regionSeattle:quarter4 1.990e-01 5.110e-02 3.895 9.85e-05 ***
## regionSouthCarolina:quarter4 8.128e-02 5.110e-02 1.591 0.111701
## regionSouthCentral:quarter4 9.805e-02 5.110e-02 1.919 0.055046 .
## regionSoutheast:quarter4 8.435e-02 5.110e-02 1.651 0.098813 .
## regionSpokane:quarter4 2.581e-01 5.110e-02 5.051 4.44e-07 ***
## regionStLouis:quarter4 1.303e-02 5.110e-02 0.255 0.798777
## regionSyracuse:quarter4 -8.261e-02 5.110e-02 -1.617 0.105954
## regionTampa:quarter4 4.048e-02 5.110e-02 0.792 0.428229
## regionTotalUS:quarter4 1.204e-01 5.117e-02 2.352 0.018680 *
## regionWest:quarter4 1.619e-01 5.110e-02 3.169 0.001534 **
## regionWestTexNewMexico:quarter4 2.119e-01 5.119e-02 4.140 3.49e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2419 on 18028 degrees of freedom
## Multiple R-squared: 0.6434, Adjusted R-squared: 0.639
## F-statistic: 147.8 on 220 and 18028 DF, p-value: < 2.2e-16
model5pf <- lm(average_price ~ type + region + quarter + year + x_large_bags + region:year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pf)
summary(model5pf)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## x_large_bags + region:year, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.03187 -0.14124 -0.00167 0.13786 1.38842
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.175e+00 2.396e-02 49.024 < 2e-16 ***
## typeorganic 4.975e-01 3.662e-03 135.876 < 2e-16 ***
## regionAtlanta -1.582e-01 3.348e-02 -4.724 2.33e-06 ***
## regionBaltimoreWashington -1.699e-01 3.348e-02 -5.075 3.92e-07 ***
## regionBoise -1.650e-01 3.348e-02 -4.928 8.38e-07 ***
## regionBoston -6.519e-02 3.348e-02 -1.947 0.051546 .
## regionBuffaloRochester 5.865e-03 3.348e-02 0.175 0.860943
## regionCalifornia -2.234e-01 3.348e-02 -6.671 2.61e-11 ***
## regionCharlotte 3.702e-02 3.348e-02 1.106 0.268906
## regionChicago -1.348e-01 3.348e-02 -4.026 5.69e-05 ***
## regionCincinnatiDayton -3.367e-01 3.348e-02 -10.056 < 2e-16 ***
## regionColumbus -2.651e-01 3.348e-02 -7.919 2.54e-15 ***
## regionDallasFtWorth -4.610e-01 3.348e-02 -13.768 < 2e-16 ***
## regionDenver -3.510e-01 3.348e-02 -10.482 < 2e-16 ***
## regionDetroit -2.016e-01 3.349e-02 -6.021 1.77e-09 ***
## regionGrandRapids -1.226e-01 3.348e-02 -3.662 0.000251 ***
## regionGreatLakes -2.160e-01 3.353e-02 -6.442 1.21e-10 ***
## regionHarrisburgScranton -6.714e-02 3.348e-02 -2.005 0.044969 *
## regionHartfordSpringfield 2.090e-01 3.348e-02 6.243 4.38e-10 ***
## regionHouston -4.908e-01 3.348e-02 -14.659 < 2e-16 ***
## regionIndianapolis -1.960e-01 3.348e-02 -5.854 4.87e-09 ***
## regionJacksonville -3.567e-02 3.348e-02 -1.065 0.286701
## regionLasVegas -1.699e-01 3.348e-02 -5.074 3.93e-07 ***
## regionLosAngeles -3.867e-01 3.348e-02 -11.549 < 2e-16 ***
## regionLouisville -2.444e-01 3.348e-02 -7.300 3.00e-13 ***
## regionMiamiFtLauderdale -1.552e-01 3.348e-02 -4.635 3.59e-06 ***
## regionMidsouth -1.876e-01 3.348e-02 -5.603 2.13e-08 ***
## regionNashville -2.616e-01 3.348e-02 -7.812 5.95e-15 ***
## regionNewOrleansMobile -2.711e-01 3.348e-02 -8.095 6.07e-16 ***
## regionNewYork 1.058e-01 3.348e-02 3.159 0.001587 **
## regionNortheast 4.958e-03 3.348e-02 0.148 0.882293
## regionNorthernNewEngland -6.538e-02 3.348e-02 -1.953 0.050862 .
## regionOrlando -3.943e-02 3.348e-02 -1.177 0.239023
## regionPhiladelphia 1.644e-02 3.348e-02 0.491 0.623441
## regionPhoenixTucson -3.816e-01 3.348e-02 -11.398 < 2e-16 ***
## regionPittsburgh -1.315e-01 3.348e-02 -3.929 8.58e-05 ***
## regionPlains -1.011e-01 3.348e-02 -3.019 0.002543 **
## regionPortland -2.320e-01 3.348e-02 -6.928 4.41e-12 ***
## regionRaleighGreensboro -8.933e-02 3.348e-02 -2.668 0.007641 **
## regionRichmondNorfolk -2.642e-01 3.348e-02 -7.892 3.15e-15 ***
## regionRoanoke -3.116e-01 3.348e-02 -9.307 < 2e-16 ***
## regionSacramento -8.471e-02 3.348e-02 -2.530 0.011414 *
## regionSanDiego -2.645e-01 3.348e-02 -7.900 2.94e-15 ***
## regionSanFrancisco 8.230e-02 3.348e-02 2.458 0.013980 *
## regionSeattle -1.166e-01 3.348e-02 -3.481 0.000501 ***
## regionSouthCarolina -8.404e-02 3.348e-02 -2.510 0.012085 *
## regionSouthCentral -4.272e-01 3.348e-02 -12.759 < 2e-16 ***
## regionSoutheast -1.241e-01 3.348e-02 -3.705 0.000212 ***
## regionSpokane -1.384e-01 3.348e-02 -4.132 3.61e-05 ***
## regionStLouis -3.539e-02 3.348e-02 -1.057 0.290606
## regionSyracuse -9.711e-03 3.348e-02 -0.290 0.771793
## regionTampa -1.821e-01 3.348e-02 -5.439 5.42e-08 ***
## regionTotalUS -2.864e-01 3.358e-02 -8.531 < 2e-16 ***
## regionWest -3.011e-01 3.348e-02 -8.992 < 2e-16 ***
## regionWestTexNewMexico -2.767e-01 3.356e-02 -8.243 < 2e-16 ***
## quarter2 8.069e-02 5.265e-03 15.325 < 2e-16 ***
## quarter3 2.184e-01 5.268e-03 41.450 < 2e-16 ***
## quarter4 1.620e-01 5.229e-03 30.989 < 2e-16 ***
## year2016 -4.861e-03 3.348e-02 -0.145 0.884570
## year2017 9.815e-02 3.332e-02 2.945 0.003230 **
## year2018 1.233e-02 5.477e-02 0.225 0.821920
## x_large_bags 2.575e-07 1.277e-07 2.017 0.043674 *
## regionAtlanta:year2016 -1.617e-01 4.735e-02 -3.414 0.000641 ***
## regionBaltimoreWashington:year2016 2.234e-01 4.735e-02 4.718 2.40e-06 ***
## regionBoise:year2016 -2.270e-01 4.735e-02 -4.793 1.65e-06 ***
## regionBoston:year2016 -4.263e-02 4.735e-02 -0.900 0.367957
## regionBuffaloRochester:year2016 -5.600e-02 4.735e-02 -1.183 0.237011
## regionCalifornia:year2016 1.603e-02 4.737e-02 0.338 0.735153
## regionCharlotte:year2016 -7.308e-02 4.735e-02 -1.543 0.122784
## regionChicago:year2016 1.479e-01 4.735e-02 3.123 0.001794 **
## regionCincinnatiDayton:year2016 -1.091e-01 4.735e-02 -2.304 0.021212 *
## regionColumbus:year2016 -8.266e-02 4.735e-02 -1.746 0.080880 .
## regionDallasFtWorth:year2016 -7.729e-02 4.735e-02 -1.632 0.102631
## regionDenver:year2016 -8.983e-02 4.735e-02 -1.897 0.057824 .
## regionDetroit:year2016 -1.613e-01 4.735e-02 -3.406 0.000661 ***
## regionGrandRapids:year2016 9.775e-02 4.735e-02 2.064 0.038991 *
## regionGreatLakes:year2016 -4.595e-02 4.736e-02 -0.970 0.331929
## regionHarrisburgScranton:year2016 4.470e-02 4.735e-02 0.944 0.345203
## regionHartfordSpringfield:year2016 1.080e-01 4.735e-02 2.282 0.022511 *
## regionHouston:year2016 -5.176e-02 4.735e-02 -1.093 0.274356
## regionIndianapolis:year2016 -3.665e-02 4.735e-02 -0.774 0.438916
## regionJacksonville:year2016 -1.307e-01 4.735e-02 -2.759 0.005797 **
## regionLasVegas:year2016 -1.159e-02 4.735e-02 -0.245 0.806598
## regionLosAngeles:year2016 -6.631e-02 4.737e-02 -1.400 0.161523
## regionLouisville:year2016 -7.807e-02 4.735e-02 -1.649 0.099214 .
## regionMiamiFtLauderdale:year2016 -9.933e-02 4.735e-02 -2.098 0.035939 *
## regionMidsouth:year2016 3.128e-03 4.736e-02 0.066 0.947330
## regionNashville:year2016 -1.562e-01 4.735e-02 -3.299 0.000971 ***
## regionNewOrleansMobile:year2016 -1.471e-02 4.735e-02 -0.311 0.756016
## regionNewYork:year2016 1.222e-01 4.735e-02 2.581 0.009869 **
## regionNortheast:year2016 5.556e-02 4.736e-02 1.173 0.240727
## regionNorthernNewEngland:year2016 -7.595e-02 4.735e-02 -1.604 0.108755
## regionOrlando:year2016 -1.241e-01 4.735e-02 -2.620 0.008803 **
## regionPhiladelphia:year2016 1.244e-01 4.735e-02 2.627 0.008634 **
## regionPhoenixTucson:year2016 1.065e-01 4.735e-02 2.248 0.024571 *
## regionPittsburgh:year2016 -5.907e-02 4.735e-02 -1.247 0.212245
## regionPlains:year2016 -5.670e-02 4.736e-02 -1.197 0.231193
## regionPortland:year2016 -1.103e-01 4.735e-02 -2.330 0.019806 *
## regionRaleighGreensboro:year2016 3.099e-03 4.735e-02 0.065 0.947823
## regionRichmondNorfolk:year2016 -5.864e-02 4.735e-02 -1.238 0.215586
## regionRoanoke:year2016 -7.486e-02 4.735e-02 -1.581 0.113932
## regionSacramento:year2016 2.189e-01 4.735e-02 4.623 3.80e-06 ***
## regionSanDiego:year2016 4.432e-02 4.735e-02 0.936 0.349308
## regionSanFrancisco:year2016 2.650e-01 4.735e-02 5.596 2.23e-08 ***
## regionSeattle:year2016 -1.171e-01 4.735e-02 -2.473 0.013403 *
## regionSouthCarolina:year2016 -1.449e-01 4.735e-02 -3.060 0.002215 **
## regionSouthCentral:year2016 -8.303e-02 4.737e-02 -1.753 0.079655 .
## regionSoutheast:year2016 -1.250e-01 4.736e-02 -2.640 0.008305 **
## regionSpokane:year2016 -6.197e-02 4.735e-02 -1.309 0.190631
## regionStLouis:year2016 -3.134e-01 4.735e-02 -6.619 3.73e-11 ***
## regionSyracuse:year2016 -2.077e-02 4.735e-02 -0.439 0.660952
## regionTampa:year2016 -8.761e-02 4.735e-02 -1.850 0.064290 .
## regionTotalUS:year2016 -2.523e-03 4.782e-02 -0.053 0.957917
## regionWest:year2016 -5.260e-02 4.735e-02 -1.111 0.266681
## regionWestTexNewMexico:year2016 -1.118e-02 4.741e-02 -0.236 0.813538
## regionAtlanta:year2017 -5.133e-02 4.713e-02 -1.089 0.276065
## regionBaltimoreWashington:year2017 2.113e-01 4.713e-02 4.483 7.40e-06 ***
## regionBoise:year2017 1.985e-02 4.713e-02 0.421 0.673552
## regionBoston:year2017 1.068e-01 4.713e-02 2.267 0.023398 *
## regionBuffaloRochester:year2017 -5.601e-02 4.713e-02 -1.189 0.234633
## regionCalifornia:year2017 1.126e-01 4.723e-02 2.384 0.017123 *
## regionCharlotte:year2017 9.489e-02 4.713e-02 2.013 0.044081 *
## regionChicago:year2017 2.114e-01 4.713e-02 4.486 7.31e-06 ***
## regionCincinnatiDayton:year2017 1.831e-02 4.713e-02 0.389 0.697600
## regionColumbus:year2017 -5.701e-02 4.713e-02 -1.210 0.226414
## regionDallasFtWorth:year2017 1.122e-04 4.713e-02 0.002 0.998100
## regionDenver:year2017 7.089e-02 4.713e-02 1.504 0.132563
## regionDetroit:year2017 -9.823e-02 4.713e-02 -2.084 0.037149 *
## regionGrandRapids:year2017 1.115e-01 4.713e-02 2.366 0.017969 *
## regionGreatLakes:year2017 -2.282e-03 4.713e-02 -0.048 0.961394
## regionHarrisburgScranton:year2017 2.497e-02 4.713e-02 0.530 0.596285
## regionHartfordSpringfield:year2017 4.139e-02 4.713e-02 0.878 0.379792
## regionHouston:year2017 -4.291e-02 4.713e-02 -0.911 0.362553
## regionIndianapolis:year2017 -1.111e-01 4.713e-02 -2.357 0.018445 *
## regionJacksonville:year2017 6.928e-02 4.713e-02 1.470 0.141549
## regionLasVegas:year2017 -5.007e-02 4.713e-02 -1.062 0.288096
## regionLosAngeles:year2017 1.211e-01 4.719e-02 2.566 0.010299 *
## regionLouisville:year2017 -3.632e-02 4.713e-02 -0.771 0.440950
## regionMiamiFtLauderdale:year2017 1.547e-01 4.713e-02 3.282 0.001034 **
## regionMidsouth:year2017 6.894e-02 4.713e-02 1.463 0.143549
## regionNashville:year2017 -1.364e-01 4.713e-02 -2.894 0.003810 **
## regionNewOrleansMobile:year2017 5.163e-02 4.713e-02 1.095 0.273319
## regionNewYork:year2017 6.570e-02 4.713e-02 1.394 0.163320
## regionNortheast:year2017 4.965e-02 4.713e-02 1.053 0.292145
## regionNorthernNewEngland:year2017 4.649e-03 4.713e-02 0.099 0.921414
## regionOrlando:year2017 8.160e-02 4.713e-02 1.731 0.083380 .
## regionPhiladelphia:year2017 5.293e-02 4.713e-02 1.123 0.261438
## regionPhoenixTucson:year2017 1.622e-02 4.713e-02 0.344 0.730705
## regionPittsburgh:year2017 -1.434e-01 4.713e-02 -3.042 0.002352 **
## regionPlains:year2017 -2.733e-02 4.713e-02 -0.580 0.561981
## regionPortland:year2017 2.846e-02 4.713e-02 0.604 0.545918
## regionRaleighGreensboro:year2017 2.201e-01 4.713e-02 4.671 3.02e-06 ***
## regionRichmondNorfolk:year2017 2.554e-02 4.713e-02 0.542 0.587832
## regionRoanoke:year2017 3.208e-02 4.713e-02 0.681 0.496074
## regionSacramento:year2017 2.207e-01 4.713e-02 4.683 2.84e-06 ***
## regionSanDiego:year2017 2.111e-01 4.713e-02 4.478 7.57e-06 ***
## regionSanFrancisco:year2017 2.455e-01 4.713e-02 5.210 1.91e-07 ***
## regionSeattle:year2017 7.804e-02 4.713e-02 1.656 0.097776 .
## regionSouthCarolina:year2017 -7.433e-02 4.713e-02 -1.577 0.114754
## regionSouthCentral:year2017 -4.930e-02 4.713e-02 -1.046 0.295573
## regionSoutheast:year2017 -5.160e-03 4.716e-02 -0.109 0.912881
## regionSpokane:year2017 1.051e-01 4.713e-02 2.230 0.025752 *
## regionStLouis:year2017 -1.074e-02 4.713e-02 -0.228 0.819670
## regionSyracuse:year2017 -3.869e-02 4.713e-02 -0.821 0.411693
## regionTampa:year2017 1.635e-01 4.713e-02 3.469 0.000525 ***
## regionTotalUS:year2017 6.318e-02 4.787e-02 1.320 0.186917
## regionWest:year2017 5.256e-02 4.713e-02 1.115 0.264729
## regionWestTexNewMexico:year2017 -7.553e-02 4.730e-02 -1.597 0.110308
## regionAtlanta:year2018 1.072e-02 7.733e-02 0.139 0.889782
## regionBaltimoreWashington:year2018 1.124e-01 7.733e-02 1.453 0.146104
## regionBoise:year2018 2.217e-01 7.733e-02 2.867 0.004149 **
## regionBoston:year2018 2.059e-01 7.733e-02 2.663 0.007742 **
## regionBuffaloRochester:year2018 -2.155e-01 7.733e-02 -2.787 0.005332 **
## regionCalifornia:year2018 1.892e-01 7.746e-02 2.443 0.014578 *
## regionCharlotte:year2018 9.679e-03 7.733e-02 0.125 0.900390
## regionChicago:year2018 2.606e-01 7.733e-02 3.370 0.000753 ***
## regionCincinnatiDayton:year2018 1.764e-01 7.733e-02 2.281 0.022553 *
## regionColumbus:year2018 9.079e-04 7.733e-02 0.012 0.990632
## regionDallasFtWorth:year2018 1.265e-01 7.733e-02 1.636 0.101773
## regionDenver:year2018 1.960e-01 7.733e-02 2.535 0.011261 *
## regionDetroit:year2018 -5.785e-02 7.733e-02 -0.748 0.454421
## regionGrandRapids:year2018 1.287e-02 7.733e-02 0.166 0.867847
## regionGreatLakes:year2018 4.963e-02 7.737e-02 0.642 0.521201
## regionHarrisburgScranton:year2018 -3.216e-02 7.733e-02 -0.416 0.677506
## regionHartfordSpringfield:year2018 3.256e-02 7.733e-02 0.421 0.673717
## regionHouston:year2018 9.701e-02 7.733e-02 1.255 0.209633
## regionIndianapolis:year2018 -7.169e-02 7.733e-02 -0.927 0.353907
## regionJacksonville:year2018 5.652e-02 7.733e-02 0.731 0.464859
## regionLasVegas:year2018 1.278e-01 7.733e-02 1.653 0.098411 .
## regionLosAngeles:year2018 2.964e-01 7.738e-02 3.831 0.000128 ***
## regionLouisville:year2018 7.646e-02 7.733e-02 0.989 0.322751
## regionMiamiFtLauderdale:year2018 6.354e-02 7.733e-02 0.822 0.411224
## regionMidsouth:year2018 1.091e-01 7.733e-02 1.411 0.158407
## regionNashville:year2018 4.803e-02 7.733e-02 0.621 0.534514
## regionNewOrleansMobile:year2018 3.935e-02 7.733e-02 0.509 0.610832
## regionNewYork:year2018 3.283e-02 7.733e-02 0.425 0.671123
## regionNortheast:year2018 3.237e-02 7.733e-02 0.419 0.675477
## regionNorthernNewEngland:year2018 5.076e-02 7.733e-02 0.656 0.511534
## regionOrlando:year2018 -4.181e-02 7.733e-02 -0.541 0.588694
## regionPhiladelphia:year2018 -3.644e-03 7.733e-02 -0.047 0.962413
## regionPhoenixTucson:year2018 1.001e-01 7.733e-02 1.295 0.195404
## regionPittsburgh:year2018 -2.885e-02 7.733e-02 -0.373 0.709069
## regionPlains:year2018 2.461e-02 7.733e-02 0.318 0.750245
## regionPortland:year2018 1.923e-01 7.733e-02 2.487 0.012901 *
## regionRaleighGreensboro:year2018 1.885e-01 7.733e-02 2.438 0.014775 *
## regionRichmondNorfolk:year2018 6.336e-02 7.733e-02 0.819 0.412568
## regionRoanoke:year2018 1.616e-01 7.733e-02 2.090 0.036623 *
## regionSacramento:year2018 1.202e-01 7.733e-02 1.555 0.119988
## regionSanDiego:year2018 3.063e-01 7.733e-02 3.962 7.48e-05 ***
## regionSanFrancisco:year2018 3.107e-02 7.733e-02 0.402 0.687859
## regionSeattle:year2018 1.357e-01 7.733e-02 1.754 0.079378 .
## regionSouthCarolina:year2018 -8.383e-02 7.733e-02 -1.084 0.278339
## regionSouthCentral:year2018 9.105e-02 7.736e-02 1.177 0.239230
## regionSoutheast:year2018 -1.071e-02 7.733e-02 -0.139 0.889809
## regionSpokane:year2018 1.276e-01 7.733e-02 1.650 0.099053 .
## regionStLouis:year2018 6.527e-02 7.733e-02 0.844 0.398628
## regionSyracuse:year2018 -1.757e-01 7.733e-02 -2.272 0.023105 *
## regionTampa:year2018 7.714e-02 7.733e-02 0.998 0.318505
## regionTotalUS:year2018 1.279e-01 7.829e-02 1.633 0.102468
## regionWest:year2018 1.602e-01 7.733e-02 2.072 0.038312 *
## regionWestTexNewMexico:year2018 9.183e-02 7.736e-02 1.187 0.235229
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2414 on 18028 degrees of freedom
## Multiple R-squared: 0.6448, Adjusted R-squared: 0.6405
## F-statistic: 148.8 on 220 and 18028 DF, p-value: < 2.2e-16
model5pg <- lm(average_price ~ type + region + quarter + year + x_large_bags + region:x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pg)
summary(model5pg)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## x_large_bags + region:x_large_bags, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.00590 -0.14516 -0.00347 0.14267 1.44125
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.157e+00 1.475e-02 78.422 < 2e-16
## typeorganic 4.999e-01 4.037e-03 123.829 < 2e-16
## regionAtlanta -2.087e-01 2.013e-02 -10.371 < 2e-16
## regionBaltimoreWashington -3.231e-02 1.986e-02 -1.627 0.103767
## regionBoise -1.940e-01 1.977e-02 -9.809 < 2e-16
## regionBoston -3.887e-02 1.994e-02 -1.949 0.051298
## regionBuffaloRochester -3.757e-02 1.981e-02 -1.896 0.057934
## regionCalifornia -1.486e-01 2.138e-02 -6.952 3.73e-12
## regionCharlotte 5.896e-02 1.972e-02 2.989 0.002799
## regionChicago -1.992e-02 2.044e-02 -0.975 0.329738
## regionCincinnatiDayton -3.552e-01 2.050e-02 -17.331 < 2e-16
## regionColumbus -3.118e-01 2.049e-02 -15.216 < 2e-16
## regionDallasFtWorth -4.683e-01 1.983e-02 -23.612 < 2e-16
## regionDenver -3.389e-01 1.973e-02 -17.180 < 2e-16
## regionDetroit -3.106e-01 2.107e-02 -14.743 < 2e-16
## regionGrandRapids -7.090e-02 2.040e-02 -3.475 0.000512
## regionGreatLakes -2.471e-01 2.142e-02 -11.533 < 2e-16
## regionHarrisburgScranton -3.937e-02 1.982e-02 -1.986 0.047013
## regionHartfordSpringfield 2.737e-01 1.994e-02 13.724 < 2e-16
## regionHouston -5.100e-01 1.971e-02 -25.876 < 2e-16
## regionIndianapolis -2.525e-01 2.077e-02 -12.158 < 2e-16
## regionJacksonville -3.545e-02 1.987e-02 -1.784 0.074463
## regionLasVegas -1.643e-01 1.982e-02 -8.289 < 2e-16
## regionLosAngeles -3.536e-01 2.157e-02 -16.388 < 2e-16
## regionLouisville -2.774e-01 2.048e-02 -13.545 < 2e-16
## regionMiamiFtLauderdale -1.309e-01 1.981e-02 -6.605 4.08e-11
## regionMidsouth -1.541e-01 2.005e-02 -7.685 1.60e-14
## regionNashville -3.399e-01 2.060e-02 -16.496 < 2e-16
## regionNewOrleansMobile -2.666e-01 2.032e-02 -13.122 < 2e-16
## regionNewYork 1.685e-01 1.998e-02 8.431 < 2e-16
## regionNortheast 4.073e-02 1.997e-02 2.039 0.041431
## regionNorthernNewEngland -8.220e-02 1.977e-02 -4.158 3.22e-05
## regionOrlando -3.994e-02 1.979e-02 -2.019 0.043541
## regionPhiladelphia 7.529e-02 1.985e-02 3.792 0.000150
## regionPhoenixTucson -2.935e-01 1.998e-02 -14.689 < 2e-16
## regionPittsburgh -1.966e-01 1.969e-02 -9.988 < 2e-16
## regionPlains -1.101e-01 2.035e-02 -5.409 6.42e-08
## regionPortland -2.287e-01 2.014e-02 -11.354 < 2e-16
## regionRaleighGreensboro 8.980e-03 1.965e-02 0.457 0.647707
## regionRichmondNorfolk -2.639e-01 1.975e-02 -13.361 < 2e-16
## regionRoanoke -3.083e-01 1.979e-02 -15.577 < 2e-16
## regionSacramento 9.105e-02 2.024e-02 4.498 6.89e-06
## regionSanDiego -1.403e-01 2.038e-02 -6.887 5.89e-12
## regionSanFrancisco 2.908e-01 2.051e-02 14.180 < 2e-16
## regionSeattle -1.056e-01 2.097e-02 -5.035 4.81e-07
## regionSouthCarolina -1.508e-01 2.000e-02 -7.541 4.87e-14
## regionSouthCentral -4.547e-01 2.026e-02 -22.448 < 2e-16
## regionSoutheast -1.581e-01 2.016e-02 -7.843 4.63e-15
## regionSpokane -8.415e-02 2.025e-02 -4.156 3.25e-05
## regionStLouis -1.118e-01 1.972e-02 -5.670 1.45e-08
## regionSyracuse -3.950e-02 1.975e-02 -2.000 0.045480
## regionTampa -1.470e-01 1.980e-02 -7.427 1.16e-13
## regionTotalUS -2.436e-01 2.125e-02 -11.463 < 2e-16
## regionWest -2.673e-01 2.074e-02 -12.891 < 2e-16
## regionWestTexNewMexico -2.812e-01 1.972e-02 -14.260 < 2e-16
## quarter2 7.926e-02 5.408e-03 14.654 < 2e-16
## quarter3 2.156e-01 5.472e-03 39.402 < 2e-16
## quarter4 1.645e-01 5.346e-03 30.762 < 2e-16
## year2016 -3.867e-02 4.730e-03 -8.175 3.14e-16
## year2017 1.399e-01 4.764e-03 29.355 < 2e-16
## year2018 9.389e-02 8.481e-03 11.071 < 2e-16
## x_large_bags 6.780e-05 3.165e-05 2.142 0.032202
## regionAtlanta:x_large_bags -7.465e-05 3.229e-05 -2.311 0.020817
## regionBaltimoreWashington:x_large_bags -4.459e-05 3.238e-05 -1.377 0.168604
## regionBoise:x_large_bags -3.981e-04 1.293e-04 -3.079 0.002077
## regionBoston:x_large_bags 1.603e-06 3.660e-05 0.044 0.965060
## regionBuffaloRochester:x_large_bags -5.922e-05 3.576e-05 -1.656 0.097767
## regionCalifornia:x_large_bags -6.834e-05 3.165e-05 -2.159 0.030867
## regionCharlotte:x_large_bags -9.327e-05 3.610e-05 -2.584 0.009779
## regionChicago:x_large_bags -4.606e-05 3.215e-05 -1.433 0.151906
## regionCincinnatiDayton:x_large_bags -5.231e-05 3.275e-05 -1.597 0.110224
## regionColumbus:x_large_bags -4.902e-05 3.321e-05 -1.476 0.139968
## regionDallasFtWorth:x_large_bags -6.657e-05 3.181e-05 -2.093 0.036381
## regionDenver:x_large_bags -3.469e-05 3.926e-05 -0.884 0.376950
## regionDetroit:x_large_bags -6.075e-05 3.169e-05 -1.917 0.055232
## regionGrandRapids:x_large_bags -5.830e-05 3.174e-05 -1.837 0.066265
## regionGreatLakes:x_large_bags -6.604e-05 3.165e-05 -2.086 0.036958
## regionHarrisburgScranton:x_large_bags -6.708e-05 3.286e-05 -2.042 0.041194
## regionHartfordSpringfield:x_large_bags -9.982e-05 3.752e-05 -2.660 0.007810
## regionHouston:x_large_bags -6.198e-05 3.185e-05 -1.946 0.051713
## regionIndianapolis:x_large_bags -5.092e-05 3.283e-05 -1.551 0.120934
## regionJacksonville:x_large_bags -8.681e-05 3.455e-05 -2.513 0.011991
## regionLasVegas:x_large_bags -2.179e-04 9.195e-05 -2.370 0.017784
## regionLosAngeles:x_large_bags -6.637e-05 3.166e-05 -2.097 0.036039
## regionLouisville:x_large_bags -3.106e-05 3.772e-05 -0.823 0.410237
## regionMiamiFtLauderdale:x_large_bags -6.002e-05 3.195e-05 -1.879 0.060329
## regionMidsouth:x_large_bags -6.620e-05 3.167e-05 -2.090 0.036587
## regionNashville:x_large_bags -6.882e-05 3.798e-05 -1.812 0.070007
## regionNewOrleansMobile:x_large_bags -5.523e-05 3.188e-05 -1.732 0.083257
## regionNewYork:x_large_bags -6.142e-05 3.195e-05 -1.922 0.054590
## regionNortheast:x_large_bags -6.551e-05 3.167e-05 -2.069 0.038594
## regionNorthernNewEngland:x_large_bags -4.558e-05 3.371e-05 -1.352 0.176342
## regionOrlando:x_large_bags -7.638e-05 3.209e-05 -2.380 0.017329
## regionPhiladelphia:x_large_bags -5.342e-05 3.439e-05 -1.553 0.120351
## regionPhoenixTucson:x_large_bags -1.429e-04 3.331e-05 -4.291 1.79e-05
## regionPittsburgh:x_large_bags -1.719e-05 3.737e-05 -0.460 0.645586
## regionPlains:x_large_bags -6.954e-05 3.170e-05 -2.194 0.028244
## regionPortland:x_large_bags -9.347e-05 3.944e-05 -2.370 0.017806
## regionRaleighGreensboro:x_large_bags -8.980e-05 3.359e-05 -2.674 0.007512
## regionRichmondNorfolk:x_large_bags -5.979e-05 3.324e-05 -1.799 0.072098
## regionRoanoke:x_large_bags -5.109e-05 3.583e-05 -1.426 0.153899
## regionSacramento:x_large_bags -1.031e-04 3.298e-05 -3.126 0.001774
## regionSanDiego:x_large_bags -1.000e-04 3.480e-05 -2.874 0.004055
## regionSanFrancisco:x_large_bags -1.300e-04 3.336e-05 -3.896 9.81e-05
## regionSeattle:x_large_bags -8.839e-05 5.055e-05 -1.749 0.080361
## regionSouthCarolina:x_large_bags -6.511e-05 3.246e-05 -2.006 0.044848
## regionSouthCentral:x_large_bags -6.733e-05 3.166e-05 -2.127 0.033439
## regionSoutheast:x_large_bags -6.729e-05 3.166e-05 -2.126 0.033558
## regionSpokane:x_large_bags -1.130e-03 2.762e-04 -4.093 4.28e-05
## regionStLouis:x_large_bags -8.268e-05 3.208e-05 -2.577 0.009971
## regionSyracuse:x_large_bags -7.061e-06 4.375e-05 -0.161 0.871772
## regionTampa:x_large_bags -6.270e-05 3.214e-05 -1.951 0.051068
## regionTotalUS:x_large_bags -6.764e-05 3.165e-05 -2.137 0.032614
## regionWest:x_large_bags -7.298e-05 3.178e-05 -2.297 0.021641
## regionWestTexNewMexico:x_large_bags -7.497e-05 3.184e-05 -2.354 0.018573
##
## (Intercept) ***
## typeorganic ***
## regionAtlanta ***
## regionBaltimoreWashington
## regionBoise ***
## regionBoston .
## regionBuffaloRochester .
## regionCalifornia ***
## regionCharlotte **
## regionChicago
## regionCincinnatiDayton ***
## regionColumbus ***
## regionDallasFtWorth ***
## regionDenver ***
## regionDetroit ***
## regionGrandRapids ***
## regionGreatLakes ***
## regionHarrisburgScranton *
## regionHartfordSpringfield ***
## regionHouston ***
## regionIndianapolis ***
## regionJacksonville .
## regionLasVegas ***
## regionLosAngeles ***
## regionLouisville ***
## regionMiamiFtLauderdale ***
## regionMidsouth ***
## regionNashville ***
## regionNewOrleansMobile ***
## regionNewYork ***
## regionNortheast *
## regionNorthernNewEngland ***
## regionOrlando *
## regionPhiladelphia ***
## regionPhoenixTucson ***
## regionPittsburgh ***
## regionPlains ***
## regionPortland ***
## regionRaleighGreensboro
## regionRichmondNorfolk ***
## regionRoanoke ***
## regionSacramento ***
## regionSanDiego ***
## regionSanFrancisco ***
## regionSeattle ***
## regionSouthCarolina ***
## regionSouthCentral ***
## regionSoutheast ***
## regionSpokane ***
## regionStLouis ***
## regionSyracuse *
## regionTampa ***
## regionTotalUS ***
## regionWest ***
## regionWestTexNewMexico ***
## quarter2 ***
## quarter3 ***
## quarter4 ***
## year2016 ***
## year2017 ***
## year2018 ***
## x_large_bags *
## regionAtlanta:x_large_bags *
## regionBaltimoreWashington:x_large_bags
## regionBoise:x_large_bags **
## regionBoston:x_large_bags
## regionBuffaloRochester:x_large_bags .
## regionCalifornia:x_large_bags *
## regionCharlotte:x_large_bags **
## regionChicago:x_large_bags
## regionCincinnatiDayton:x_large_bags
## regionColumbus:x_large_bags
## regionDallasFtWorth:x_large_bags *
## regionDenver:x_large_bags
## regionDetroit:x_large_bags .
## regionGrandRapids:x_large_bags .
## regionGreatLakes:x_large_bags *
## regionHarrisburgScranton:x_large_bags *
## regionHartfordSpringfield:x_large_bags **
## regionHouston:x_large_bags .
## regionIndianapolis:x_large_bags
## regionJacksonville:x_large_bags *
## regionLasVegas:x_large_bags *
## regionLosAngeles:x_large_bags *
## regionLouisville:x_large_bags
## regionMiamiFtLauderdale:x_large_bags .
## regionMidsouth:x_large_bags *
## regionNashville:x_large_bags .
## regionNewOrleansMobile:x_large_bags .
## regionNewYork:x_large_bags .
## regionNortheast:x_large_bags *
## regionNorthernNewEngland:x_large_bags
## regionOrlando:x_large_bags *
## regionPhiladelphia:x_large_bags
## regionPhoenixTucson:x_large_bags ***
## regionPittsburgh:x_large_bags
## regionPlains:x_large_bags *
## regionPortland:x_large_bags *
## regionRaleighGreensboro:x_large_bags **
## regionRichmondNorfolk:x_large_bags .
## regionRoanoke:x_large_bags
## regionSacramento:x_large_bags **
## regionSanDiego:x_large_bags **
## regionSanFrancisco:x_large_bags ***
## regionSeattle:x_large_bags .
## regionSouthCarolina:x_large_bags *
## regionSouthCentral:x_large_bags *
## regionSoutheast:x_large_bags *
## regionSpokane:x_large_bags ***
## regionStLouis:x_large_bags **
## regionSyracuse:x_large_bags
## regionTampa:x_large_bags .
## regionTotalUS:x_large_bags *
## regionWest:x_large_bags *
## regionWestTexNewMexico:x_large_bags *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2465 on 18134 degrees of freedom
## Multiple R-squared: 0.6276, Adjusted R-squared: 0.6253
## F-statistic: 268.1 on 114 and 18134 DF, p-value: < 2.2e-16
model5ph <- lm(average_price ~ type + region + quarter + year + x_large_bags + quarter:year, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5ph)
summary(model5ph)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## x_large_bags + quarter:year, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.96209 -0.13588 -0.00192 0.13567 1.48311
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.259e+00 1.454e-02 86.603 < 2e-16 ***
## typeorganic 4.983e-01 3.630e-03 137.274 < 2e-16 ***
## regionAtlanta -2.233e-01 1.846e-02 -12.101 < 2e-16 ***
## regionBaltimoreWashington -2.699e-02 1.846e-02 -1.463 0.143613
## regionBoise -2.129e-01 1.846e-02 -11.533 < 2e-16 ***
## regionBoston -3.020e-02 1.846e-02 -1.636 0.101842
## regionBuffaloRochester -4.425e-02 1.846e-02 -2.397 0.016525 *
## regionCalifornia -1.717e-01 1.855e-02 -9.257 < 2e-16 ***
## regionCharlotte 4.497e-02 1.846e-02 2.437 0.014836 *
## regionChicago -4.646e-03 1.846e-02 -0.252 0.801250
## regionCincinnatiDayton -3.521e-01 1.846e-02 -19.077 < 2e-16 ***
## regionColumbus -3.085e-01 1.846e-02 -16.713 < 2e-16 ***
## regionDallasFtWorth -4.759e-01 1.846e-02 -25.784 < 2e-16 ***
## regionDenver -3.425e-01 1.846e-02 -18.556 < 2e-16 ***
## regionDetroit -2.868e-01 1.847e-02 -15.531 < 2e-16 ***
## regionGrandRapids -5.695e-02 1.846e-02 -3.085 0.002036 **
## regionGreatLakes -2.298e-01 1.859e-02 -12.358 < 2e-16 ***
## regionHarrisburgScranton -4.788e-02 1.846e-02 -2.594 0.009488 **
## regionHartfordSpringfield 2.576e-01 1.846e-02 13.955 < 2e-16 ***
## regionHouston -5.134e-01 1.846e-02 -27.819 < 2e-16 ***
## regionIndianapolis -2.473e-01 1.846e-02 -13.400 < 2e-16 ***
## regionJacksonville -5.016e-02 1.846e-02 -2.718 0.006578 **
## regionLasVegas -1.801e-01 1.846e-02 -9.758 < 2e-16 ***
## regionLosAngeles -3.497e-01 1.851e-02 -18.888 < 2e-16 ***
## regionLouisville -2.744e-01 1.846e-02 -14.869 < 2e-16 ***
## regionMiamiFtLauderdale -1.328e-01 1.846e-02 -7.198 6.36e-13 ***
## regionMidsouth -1.578e-01 1.846e-02 -8.548 < 2e-16 ***
## regionNashville -3.490e-01 1.846e-02 -18.910 < 2e-16 ***
## regionNewOrleansMobile -2.568e-01 1.846e-02 -13.913 < 2e-16 ***
## regionNewYork 1.662e-01 1.846e-02 9.004 < 2e-16 ***
## regionNortheast 3.943e-02 1.846e-02 2.136 0.032703 *
## regionNorthernNewEngland -8.372e-02 1.846e-02 -4.536 5.77e-06 ***
## regionOrlando -5.505e-02 1.846e-02 -2.983 0.002860 **
## regionPhiladelphia 7.102e-02 1.846e-02 3.848 0.000119 ***
## regionPhoenixTucson -3.367e-01 1.846e-02 -18.245 < 2e-16 ***
## regionPittsburgh -1.967e-01 1.846e-02 -10.659 < 2e-16 ***
## regionPlains -1.258e-01 1.846e-02 -6.812 9.90e-12 ***
## regionPortland -2.434e-01 1.846e-02 -13.185 < 2e-16 ***
## regionRaleighGreensboro -5.977e-03 1.846e-02 -0.324 0.746076
## regionRichmondNorfolk -2.698e-01 1.846e-02 -14.618 < 2e-16 ***
## regionRoanoke -3.131e-01 1.846e-02 -16.967 < 2e-16 ***
## regionSacramento 6.034e-02 1.846e-02 3.269 0.001079 **
## regionSanDiego -1.630e-01 1.846e-02 -8.831 < 2e-16 ***
## regionSanFrancisco 2.430e-01 1.846e-02 13.165 < 2e-16 ***
## regionSeattle -1.185e-01 1.846e-02 -6.420 1.40e-10 ***
## regionSouthCarolina -1.580e-01 1.846e-02 -8.559 < 2e-16 ***
## regionSouthCentral -4.628e-01 1.848e-02 -25.043 < 2e-16 ***
## regionSoutheast -1.659e-01 1.848e-02 -8.977 < 2e-16 ***
## regionSpokane -1.154e-01 1.846e-02 -6.253 4.12e-10 ***
## regionStLouis -1.306e-01 1.846e-02 -7.077 1.52e-12 ***
## regionSyracuse -4.071e-02 1.846e-02 -2.206 0.027420 *
## regionTampa -1.524e-01 1.846e-02 -8.258 < 2e-16 ***
## regionTotalUS -2.666e-01 1.998e-02 -13.348 < 2e-16 ***
## regionWest -2.897e-01 1.846e-02 -15.696 < 2e-16 ***
## regionWestTexNewMexico -2.969e-01 1.850e-02 -16.051 < 2e-16 ***
## quarter2 2.117e-02 9.056e-03 2.338 0.019420 *
## quarter3 8.279e-02 9.056e-03 9.142 < 2e-16 ***
## quarter4 -1.080e-02 9.058e-03 -1.192 0.233314
## year2016 -1.186e-01 9.059e-03 -13.097 < 2e-16 ***
## year2017 -5.756e-02 9.061e-03 -6.352 2.17e-10 ***
## year2018 -6.568e-03 9.262e-03 -0.709 0.478278
## x_large_bags 3.887e-07 1.206e-07 3.222 0.001273 **
## quarter2:year2016 -2.921e-02 1.281e-02 -2.281 0.022572 *
## quarter3:year2016 9.430e-02 1.281e-02 7.362 1.89e-13 ***
## quarter4:year2016 2.576e-01 1.281e-02 20.108 < 2e-16 ***
## quarter2:year2017 2.074e-01 1.281e-02 16.187 < 2e-16 ***
## quarter3:year2017 3.116e-01 1.281e-02 24.323 < 2e-16 ***
## quarter4:year2017 2.620e-01 1.270e-02 20.641 < 2e-16 ***
## quarter2:year2018 NA NA NA NA
## quarter3:year2018 NA NA NA NA
## quarter4:year2018 NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2399 on 18181 degrees of freedom
## Multiple R-squared: 0.6463, Adjusted R-squared: 0.645
## F-statistic: 495.8 on 67 and 18181 DF, p-value: < 2.2e-16
model5pi <- lm(average_price ~ type + region + quarter + year + x_large_bags + quarter:x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pi)
summary(model5pi)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## x_large_bags + quarter:x_large_bags, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0362 -0.1455 -0.0045 0.1442 1.4394
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.167e+00 1.429e-02 81.659 < 2e-16 ***
## typeorganic 4.981e-01 3.765e-03 132.295 < 2e-16 ***
## regionAtlanta -2.233e-01 1.909e-02 -11.698 < 2e-16 ***
## regionBaltimoreWashington -2.698e-02 1.909e-02 -1.413 0.157567
## regionBoise -2.129e-01 1.909e-02 -11.150 < 2e-16 ***
## regionBoston -3.019e-02 1.909e-02 -1.581 0.113792
## regionBuffaloRochester -4.424e-02 1.909e-02 -2.317 0.020490 *
## regionCalifornia -1.710e-01 1.923e-02 -8.894 < 2e-16 ***
## regionCharlotte 4.497e-02 1.909e-02 2.356 0.018507 *
## regionChicago -4.605e-03 1.909e-02 -0.241 0.809395
## regionCincinnatiDayton -3.521e-01 1.909e-02 -18.440 < 2e-16 ***
## regionColumbus -3.084e-01 1.909e-02 -16.156 < 2e-16 ***
## regionDallasFtWorth -4.758e-01 1.909e-02 -24.922 < 2e-16 ***
## regionDenver -3.425e-01 1.909e-02 -17.938 < 2e-16 ***
## regionDetroit -2.866e-01 1.910e-02 -15.004 < 2e-16 ***
## regionGrandRapids -5.683e-02 1.909e-02 -2.977 0.002919 **
## regionGreatLakes -2.290e-01 1.926e-02 -11.892 < 2e-16 ***
## regionHarrisburgScranton -4.787e-02 1.909e-02 -2.507 0.012170 *
## regionHartfordSpringfield 2.576e-01 1.909e-02 13.491 < 2e-16 ***
## regionHouston -5.134e-01 1.909e-02 -26.891 < 2e-16 ***
## regionIndianapolis -2.473e-01 1.909e-02 -12.953 < 2e-16 ***
## regionJacksonville -5.016e-02 1.909e-02 -2.627 0.008616 **
## regionLasVegas -1.801e-01 1.909e-02 -9.433 < 2e-16 ***
## regionLosAngeles -3.492e-01 1.917e-02 -18.212 < 2e-16 ***
## regionLouisville -2.744e-01 1.909e-02 -14.374 < 2e-16 ***
## regionMiamiFtLauderdale -1.328e-01 1.909e-02 -6.958 3.57e-12 ***
## regionMidsouth -1.577e-01 1.910e-02 -8.258 < 2e-16 ***
## regionNashville -3.490e-01 1.909e-02 -18.281 < 2e-16 ***
## regionNewOrleansMobile -2.567e-01 1.909e-02 -13.447 < 2e-16 ***
## regionNewYork 1.662e-01 1.909e-02 8.705 < 2e-16 ***
## regionNortheast 3.951e-02 1.910e-02 2.069 0.038564 *
## regionNorthernNewEngland -8.371e-02 1.909e-02 -4.385 1.17e-05 ***
## regionOrlando -5.505e-02 1.909e-02 -2.883 0.003941 **
## regionPhiladelphia 7.103e-02 1.909e-02 3.721 0.000199 ***
## regionPhoenixTucson -3.367e-01 1.909e-02 -17.636 < 2e-16 ***
## regionPittsburgh -1.967e-01 1.909e-02 -10.305 < 2e-16 ***
## regionPlains -1.257e-01 1.910e-02 -6.582 4.78e-11 ***
## regionPortland -2.433e-01 1.909e-02 -12.746 < 2e-16 ***
## regionRaleighGreensboro -5.973e-03 1.909e-02 -0.313 0.754386
## regionRichmondNorfolk -2.698e-01 1.909e-02 -14.132 < 2e-16 ***
## regionRoanoke -3.131e-01 1.909e-02 -16.402 < 2e-16 ***
## regionSacramento 6.037e-02 1.909e-02 3.162 0.001568 **
## regionSanDiego -1.630e-01 1.909e-02 -8.536 < 2e-16 ***
## regionSanFrancisco 2.430e-01 1.909e-02 12.728 < 2e-16 ***
## regionSeattle -1.185e-01 1.909e-02 -6.206 5.55e-10 ***
## regionSouthCarolina -1.580e-01 1.909e-02 -8.273 < 2e-16 ***
## regionSouthCentral -4.624e-01 1.912e-02 -24.183 < 2e-16 ***
## regionSoutheast -1.657e-01 1.911e-02 -8.671 < 2e-16 ***
## regionSpokane -1.154e-01 1.909e-02 -6.045 1.53e-09 ***
## regionStLouis -1.306e-01 1.909e-02 -6.841 8.09e-12 ***
## regionSyracuse -4.071e-02 1.909e-02 -2.132 0.032995 *
## regionTampa -1.524e-01 1.909e-02 -7.983 1.52e-15 ***
## regionTotalUS -2.643e-01 2.085e-02 -12.677 < 2e-16 ***
## regionWest -2.896e-01 1.910e-02 -15.166 < 2e-16 ***
## regionWestTexNewMexico -2.969e-01 1.913e-02 -15.516 < 2e-16 ***
## quarter2 8.023e-02 5.472e-03 14.661 < 2e-16 ***
## quarter3 2.180e-01 5.470e-03 39.862 < 2e-16 ***
## quarter4 1.620e-01 5.440e-03 29.780 < 2e-16 ***
## year2016 -3.793e-02 4.696e-03 -8.079 6.94e-16 ***
## year2017 1.375e-01 4.681e-03 29.369 < 2e-16 ***
## year2018 8.566e-02 8.383e-03 10.219 < 2e-16 ***
## x_large_bags 2.976e-07 2.196e-07 1.355 0.175445
## quarter2:x_large_bags 1.247e-07 2.852e-07 0.437 0.661831
## quarter3:x_large_bags 5.626e-08 2.654e-07 0.212 0.832142
## quarter4:x_large_bags 2.886e-08 4.411e-07 0.065 0.947840
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2482 on 18184 degrees of freedom
## Multiple R-squared: 0.6214, Adjusted R-squared: 0.6201
## F-statistic: 466.4 on 64 and 18184 DF, p-value: < 2.2e-16
model5pj <- lm(average_price ~ type + region + quarter + year + x_large_bags + year:x_large_bags, data = trimmed_avocados)
par(mfrow = c(2, 2))
plot(model5pj)
summary(model5pj)
##
## Call:
## lm(formula = average_price ~ type + region + quarter + year +
## x_large_bags + year:x_large_bags, data = trimmed_avocados)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.03659 -0.14579 -0.00433 0.14385 1.44061
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.168e+00 1.429e-02 81.749 < 2e-16 ***
## typeorganic 4.974e-01 3.764e-03 132.154 < 2e-16 ***
## regionAtlanta -2.234e-01 1.909e-02 -11.704 < 2e-16 ***
## regionBaltimoreWashington -2.699e-02 1.909e-02 -1.414 0.157396
## regionBoise -2.129e-01 1.909e-02 -11.152 < 2e-16 ***
## regionBoston -3.020e-02 1.909e-02 -1.582 0.113661
## regionBuffaloRochester -4.425e-02 1.909e-02 -2.318 0.020447 *
## regionCalifornia -1.711e-01 1.919e-02 -8.919 < 2e-16 ***
## regionCharlotte 4.496e-02 1.909e-02 2.356 0.018508 *
## regionChicago -4.466e-03 1.909e-02 -0.234 0.815010
## regionCincinnatiDayton -3.516e-01 1.909e-02 -18.420 < 2e-16 ***
## regionColumbus -3.080e-01 1.909e-02 -16.136 < 2e-16 ***
## regionDallasFtWorth -4.756e-01 1.909e-02 -24.916 < 2e-16 ***
## regionDenver -3.424e-01 1.909e-02 -17.941 < 2e-16 ***
## regionDetroit -2.846e-01 1.911e-02 -14.895 < 2e-16 ***
## regionGrandRapids -5.664e-02 1.909e-02 -2.967 0.003014 **
## regionGreatLakes -2.233e-01 1.934e-02 -11.543 < 2e-16 ***
## regionHarrisburgScranton -4.784e-02 1.909e-02 -2.506 0.012213 *
## regionHartfordSpringfield 2.576e-01 1.909e-02 13.494 < 2e-16 ***
## regionHouston -5.131e-01 1.909e-02 -26.880 < 2e-16 ***
## regionIndianapolis -2.468e-01 1.909e-02 -12.931 < 2e-16 ***
## regionJacksonville -5.016e-02 1.909e-02 -2.628 0.008599 **
## regionLasVegas -1.801e-01 1.909e-02 -9.435 < 2e-16 ***
## regionLosAngeles -3.490e-01 1.915e-02 -18.229 < 2e-16 ***
## regionLouisville -2.742e-01 1.909e-02 -14.368 < 2e-16 ***
## regionMiamiFtLauderdale -1.328e-01 1.909e-02 -6.959 3.55e-12 ***
## regionMidsouth -1.574e-01 1.909e-02 -8.244 < 2e-16 ***
## regionNashville -3.490e-01 1.909e-02 -18.284 < 2e-16 ***
## regionNewOrleansMobile -2.568e-01 1.909e-02 -13.454 < 2e-16 ***
## regionNewYork 1.661e-01 1.909e-02 8.703 < 2e-16 ***
## regionNortheast 3.946e-02 1.909e-02 2.067 0.038749 *
## regionNorthernNewEngland -8.372e-02 1.909e-02 -4.386 1.16e-05 ***
## regionOrlando -5.503e-02 1.909e-02 -2.883 0.003940 **
## regionPhiladelphia 7.103e-02 1.909e-02 3.721 0.000199 ***
## regionPhoenixTucson -3.367e-01 1.909e-02 -17.642 < 2e-16 ***
## regionPittsburgh -1.967e-01 1.909e-02 -10.307 < 2e-16 ***
## regionPlains -1.253e-01 1.909e-02 -6.565 5.33e-11 ***
## regionPortland -2.433e-01 1.909e-02 -12.745 < 2e-16 ***
## regionRaleighGreensboro -5.975e-03 1.909e-02 -0.313 0.754243
## regionRichmondNorfolk -2.698e-01 1.909e-02 -14.135 < 2e-16 ***
## regionRoanoke -3.131e-01 1.909e-02 -16.406 < 2e-16 ***
## regionSacramento 6.033e-02 1.909e-02 3.161 0.001576 **
## regionSanDiego -1.630e-01 1.909e-02 -8.539 < 2e-16 ***
## regionSanFrancisco 2.430e-01 1.909e-02 12.729 < 2e-16 ***
## regionSeattle -1.185e-01 1.909e-02 -6.207 5.53e-10 ***
## regionSouthCarolina -1.580e-01 1.909e-02 -8.278 < 2e-16 ***
## regionSouthCentral -4.616e-01 1.911e-02 -24.147 < 2e-16 ***
## regionSoutheast -1.660e-01 1.911e-02 -8.687 < 2e-16 ***
## regionSpokane -1.154e-01 1.909e-02 -6.046 1.51e-09 ***
## regionStLouis -1.306e-01 1.909e-02 -6.842 8.07e-12 ***
## regionSyracuse -4.071e-02 1.909e-02 -2.133 0.032948 *
## regionTampa -1.524e-01 1.909e-02 -7.984 1.50e-15 ***
## regionTotalUS -2.574e-01 2.084e-02 -12.350 < 2e-16 ***
## regionWest -2.895e-01 1.909e-02 -15.163 < 2e-16 ***
## regionWestTexNewMexico -2.967e-01 1.913e-02 -15.509 < 2e-16 ***
## quarter2 8.056e-02 5.411e-03 14.887 < 2e-16 ***
## quarter3 2.183e-01 5.415e-03 40.325 < 2e-16 ***
## quarter4 1.626e-01 5.378e-03 30.244 < 2e-16 ***
## year2016 -3.908e-02 4.749e-03 -8.229 < 2e-16 ***
## year2017 1.354e-01 4.739e-03 28.580 < 2e-16 ***
## year2018 8.441e-02 8.477e-03 9.957 < 2e-16 ***
## x_large_bags -1.140e-06 5.468e-07 -2.085 0.037091 *
## year2016:x_large_bags 1.419e-06 5.571e-07 2.547 0.010880 *
## year2017:x_large_bags 1.642e-06 5.537e-07 2.966 0.003023 **
## year2018:x_large_bags 1.461e-06 5.948e-07 2.456 0.014054 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2481 on 18184 degrees of freedom
## Multiple R-squared: 0.6216, Adjusted R-squared: 0.6203
## F-statistic: 466.8 on 64 and 18184 DF, p-value: < 2.2e-16
So it looks like model5pa with the type, region, quarter, year, x_large_bags and type:region is the best, with a moderate gain in multiple-\(r^2\) due to the interaction. However, we need to test for the significance of the interaction given the various \(p\)-values of the associated coefficients
anova(model5, model5pa)
Neat, it looks like including the interaction is statistically justified.
Let’s try to fit a predictive model using glmulti()
library(glmulti)
## Loading required package: rJava
This data is pretty big for glmulti on a single CPU core, so we’ll likely not be able to do a search simultaneously for both main effects and pairwise interactions. Let’s look first for the best main effects model using BIC as our metric:
# we're putting set.seed() in here for reproducibility, but you shouldn't include
# this in production code
set.seed(42)
n_data <- nrow(trimmed_avocados)
test_index <- sample(1:n_data, size = n_data * 0.2)
test <- slice(trimmed_avocados, test_index)
train <- slice(trimmed_avocados, -test_index)
# sanity check
nrow(test) + nrow(train) == n_data
## [1] TRUE
nrow(test)
## [1] 3649
nrow(train)
## [1] 14600
glmulti_fit <- glmulti(
average_price ~ .,
data = train,
level = 1, # 2 = include pairwise interactions, 1 = main effects only (main effect = no pairwise interactions)
minsize = 1, # no min size of model
maxsize = -1, # -1 = no max size of model
marginality = TRUE, # marginality here means the same as 'strongly hierarchical' interactions, i.e. include pairwise interactions only if both predictors present in the model as main effects.
method = "h", # try exhaustive search, or could use "g" for genetic algorithm instead
crit = bic, # criteria for model selection is BIC value (lower is better)
plotty = FALSE, # don't plot models as function runs
report = TRUE, # do produce reports as function runs
confsetsize = 10, # return best 10 solutions
fitfunction = lm # fit using the `lm` function
)
## Initialization...
## TASK: Exhaustive screening of candidate set.
## Fitting...
##
## After 50 models:
## Best model: average_price~1+total_volume+x4225+x4770+small_bags
## Crit= 14290.9309630974
## Mean crit= 14302.7404848229
##
## After 100 models:
## Best model: average_price~1+total_volume+x4046+x4770+large_bags
## Crit= 14287.0201451487
## Mean crit= 14295.0164087599
##
## After 150 models:
## Best model: average_price~1+x4046+x4225+x4770+x_large_bags
## Crit= 14282.9391871136
## Mean crit= 14288.655212267
##
## After 200 models:
## Best model: average_price~1+total_volume+x4225+x4770+small_bags+x_large_bags
## Crit= 14279.4193694914
## Mean crit= 14287.2170591254
##
## After 250 models:
## Best model: average_price~1+total_volume+x4225+x4770+small_bags+x_large_bags
## Crit= 14279.4193694914
## Mean crit= 14285.8311251354
##
## After 300 models:
## Best model: average_price~1+x4225+region
## Crit= 11937.9201055248
## Mean crit= 11948.2560227565
##
## After 350 models:
## Best model: average_price~1+x4225+region
## Crit= 11937.9201055248
## Mean crit= 11946.7914993621
##
## After 400 models:
## Best model: average_price~1+total_volume+x4046+x_large_bags+region
## Crit= 11925.5711979638
## Mean crit= 11936.6091736735
##
## After 450 models:
## Best model: average_price~1+total_volume+x4225+x_large_bags+region
## Crit= 11921.550556659
## Mean crit= 11926.6655997315
##
## After 500 models:
## Best model: average_price~1+total_volume+x4225+x_large_bags+region
## Crit= 11921.550556659
## Mean crit= 11926.6075265317
##
## After 550 models:
## Best model: average_price~1+type+x4046+x4225+x4770
## Crit= 7734.8593327961
## Mean crit= 7793.00094370359
##
## After 600 models:
## Best model: average_price~1+type+total_volume+x4225+x4770+small_bags
## Crit= 7697.79186941605
## Mean crit= 7707.77632276868
##
## After 650 models:
## Best model: average_price~1+type+total_volume+x4046+x4225+x4770+small_bags+large_bags
## Crit= 7665.37294598611
## Mean crit= 7691.99478130502
##
## After 700 models:
## Best model: average_price~1+type+total_volume+x4046+x4225+total_bags+small_bags+large_bags
## Crit= 7665.32155031575
## Mean crit= 7671.67032729922
##
## After 750 models:
## Best model: average_price~1+type+total_volume+x4225+x4770+small_bags+x_large_bags
## Crit= 7657.97738881932
## Mean crit= 7664.22564265621
##
## After 800 models:
## Best model: average_price~1+type+total_volume+x4225+region
## Crit= 3977.52101108293
## Mean crit= 5088.37870955926
##
## After 850 models:
## Best model: average_price~1+type+total_volume+small_bags+region
## Crit= 3964.67515907674
## Mean crit= 3970.85743694413
##
## After 900 models:
## Best model: average_price~1+type+total_volume+small_bags+region
## Crit= 3964.67515907674
## Mean crit= 3969.31227685631
##
## After 950 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3925.01550491875
##
## After 1000 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.49480426776
##
## After 1050 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1100 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1150 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1200 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1250 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1300 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1350 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1400 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1450 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1500 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1550 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1600 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1650 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1700 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1750 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1800 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1850 models:
## Best model: average_price~1+type+x4225+x_large_bags+region
## Crit= 3918.18091232781
## Mean crit= 3924.14806552202
##
## After 1900 models:
## Best model: average_price~1+type+year+x4225+region
## Crit= 2782.77393470439
## Mean crit= 2786.76963771313
##
## After 1950 models:
## Best model: average_price~1+type+year+x4225+region
## Crit= 2782.77393470439
## Mean crit= 2785.83493132762
##
## After 2000 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2749.79339051662
##
## After 2050 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2747.16085350457
##
## After 2100 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2150 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2200 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2250 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2300 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2350 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2400 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2450 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2500 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2550 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2600 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2650 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2700 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2750 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2800 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2850 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2900 models:
## Best model: average_price~1+type+year+total_volume+x_large_bags+region
## Crit= 2740.75749578209
## Mean crit= 2745.77695032743
##
## After 2950 models:
## Best model: average_price~1+type+quarter+x4225+small_bags+region
## Crit= 2606.2768667676
## Mean crit= 2614.89356850007
##
## After 3000 models:
## Best model: average_price~1+type+quarter+x4225+small_bags+region
## Crit= 2606.2768667676
## Mean crit= 2612.22085451977
##
## After 3050 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2593.56894619082
##
## After 3100 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.63505461122
##
## After 3150 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3200 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3250 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3300 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3350 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3400 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3450 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3500 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3550 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3600 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3650 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3700 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3750 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3800 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3850 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3900 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 3950 models:
## Best model: average_price~1+type+quarter+x4225+x_large_bags+region
## Crit= 2578.04076821456
## Mean crit= 2586.34438169224
##
## After 4000 models:
## Best model: average_price~1+type+year+quarter+x4225+x4770+region
## Crit= 1373.42931760425
## Mean crit= 1379.26585957081
##
## After 4050 models:
## Best model: average_price~1+type+year+quarter+x4225+x4770+region
## Crit= 1373.42931760425
## Mean crit= 1377.67893634087
##
## After 4100 models:
## Best model: average_price~1+type+year+quarter+total_bags+small_bags+large_bags+region
## Crit= 1364.21555034686
## Mean crit= 1371.92259424323
##
## After 4150 models:
## Best model: average_price~1+type+year+quarter+total_volume+x_large_bags+region
## Crit= 1355.81959712867
## Mean crit= 1360.09536364159
##
## After 4200 models:
## Best model: average_price~1+type+year+quarter+total_volume+x_large_bags+region
## Crit= 1355.81959712867
## Mean crit= 1358.68681119117
##
## After 4250 models:
## Best model: average_price~1+type+year+quarter+total_volume+x_large_bags+region
## Crit= 1355.81959712867
## Mean crit= 1358.68681119117
## Completed.
summary(glmulti_fit)
## $name
## [1] "glmulti.analysis"
##
## $method
## [1] "h"
##
## $fitting
## [1] "lm"
##
## $crit
## [1] "bic"
##
## $level
## [1] 1
##
## $marginality
## [1] TRUE
##
## $confsetsize
## [1] 10
##
## $bestic
## [1] 1355.82
##
## $icvalues
## [1] 1355.820 1356.338 1356.942 1358.332 1359.273 1359.330 1359.344 1360.079
## [9] 1360.619 1360.791
##
## $bestmodel
## [1] "average_price ~ 1 + type + year + quarter + total_volume + x_large_bags + "
## [2] " region"
##
## $modelweights
## [1] 0.29050767 0.22415209 0.16577867 0.08272938 0.05166463 0.05021337
## [7] 0.04988139 0.03452781 0.02636050 0.02418450
##
## $includeobjects
## [1] TRUE
So the lowest BIC model with main effects is average_price ~ type + year + quarter + total_volume + x_large_bags + region. Let’s have a look at possible extensions to this. We’re going to deliberately try to go to the point where models start to overfit (as tested by the RMSE on the test set), so we’ve seen what this looks like.
results <- tibble(
name = c(), bic = c(), rmse_train = c(), rmse_test = c()
)
# lowest BIC model with main effects
lowest_bic_model <- lm(average_price ~ type + year + quarter + total_volume + x_large_bags + region, data = train)
results <- results %>%
add_row(
tibble_row(
name = "lowest bic",
bic = bic(lowest_bic_model),
rmse_train = rmse(lowest_bic_model, train),
rmse_test = rmse(lowest_bic_model, test)
)
)
# try adding in all possible pairs with these main effects
lowest_bic_model_all_pairs <- lm(average_price ~ (type + year + quarter + total_volume + x_large_bags + region)^2, data = train)
results <- results %>%
add_row(
tibble_row(
name = "lowest bic all pairs",
bic = bic(lowest_bic_model_all_pairs),
rmse_train = rmse(lowest_bic_model_all_pairs, train),
rmse_test = rmse(lowest_bic_model_all_pairs, test)
)
)
## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading
## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading
# try a model with all main effects
model_all_mains <- lm(average_price ~ ., data = train)
results <- results %>%
add_row(
tibble_row(
name = "all mains",
bic = bic(model_all_mains),
rmse_train = rmse(model_all_mains, train),
rmse_test = rmse(model_all_mains, test)
)
)
# try a model with all main effects and all pairs
model_all_pairs <- lm(average_price ~ .^2, data = train)
results <- results %>%
add_row(
tibble_row(
name = "all pairs",
bic = bic(model_all_pairs),
rmse_train = rmse(model_all_pairs, train),
rmse_test = rmse(model_all_pairs, test)
)
)
## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading
## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading
# try a model with all main effects, all pairs and one triple (this is getting silly)
model_all_pairs_one_triple <- lm(average_price ~ .^2 + region:type:year, data = train)
results <- results %>%
add_row(
tibble_row(
name = "all pairs one triple",
bic = bic(model_all_pairs_one_triple),
rmse_train = rmse(model_all_pairs_one_triple, train),
rmse_test = rmse(model_all_pairs_one_triple, test)
)
)
## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading
## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading
# try a model with all main effects, all pairs and multiple triples (more silly)
model_all_pairs_multi_triples <- lm(average_price ~ .^2 + region:type:year + region:type:quarter + region:year:quarter, data = train)
results <- results %>%
add_row(
tibble_row(
name = "all pairs multi triples",
bic = bic(model_all_pairs_multi_triples),
rmse_train = rmse(model_all_pairs_multi_triples, train),
rmse_test = rmse(model_all_pairs_multi_triples, test)
)
)
## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading
## Warning in predict.lm(model, data): prediction from a rank-deficient fit may be
## misleading
results <- results %>%
pivot_longer(cols = bic:rmse_test, names_to = "measure", values_to = "value") %>%
mutate(
name = fct_relevel(
as_factor(name),
"lowest bic", "all mains", "lowest bic all pairs", "all pairs", "all pairs one triple", "all pairs multi triples"
)
)
results %>%
filter(measure == "bic") %>%
ggplot(aes(x = name, y = value)) +
geom_col(fill = "steelblue", alpha = 0.7) +
labs(
x = "model",
y = "bic"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_hline(aes(yintercept = 0))
BIC is telling us here that if we took our main effects model with lowest BIC, and added in all possible pairs, this would likely still improve the model for predictive purposes. BIC suggests that this ‘lowest BIC all pairs’ model will offer best predictive performance without overfitting, with all other models being significantly poorer.
Let’s compare the RMSE values of the various models for train and test sets. We expect train RMSE always to go down as model complexity increases, but what happens to the test RMSE as models get more complex?
results %>%
filter(measure != "bic") %>%
ggplot(aes(x = name, y = value, fill = measure)) +
geom_col(position = "dodge", alpha = 0.7) +
labs(
x = "model",
y = "rmse"
) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Lowest RMSE in test is obtained for the ‘lowest bic all pairs’ model, and it increases thereafter for the more complex models, which suggests that these models are overfitting the training data.